January 22, 2016, 1 PM EST

Attendees

Steering Group Members

Paul AlbertJon Corson-Rikert (star)Melissa HaendelKristi Holmes Dean B. KrafftRobert H. McDonaldEric MeeksBart RagonJulia TrimmerAlex Viggio

(star)= note taker

Ex officio

Graham TriggsMike ConlonJonathan Markowdebra hanken kurtz

Regrets

 Andi Ogier

 

Dial-In Number:  (712) 775-7035; 989199  Local country dial-in codes

Agenda

 
Item
Time
Facilitator
Notes
1Updates5 minAllSee below
2Review agenda2 minAllRevise, reorder if needed
3Thoughts from Jon50 minJon 
7Future topics3 minAllWebinars, VIVO Days, training

Notes

  1. Updates
    1. More about WebEx:
      1. Mike Conlon can set up any meeting – there is one account in Mike's name.  Mike has set up recurring calls for the Interest Groups and for this group. The May and November Leadership calls can be held by WebEx.
      2. Meeting participants are expected to join by computer.  There is a non toll-free dial-in number in the US.
      3. All meetings can be recorded.  Recordings can be converted to MPEG4.  Recording will be saved outside of WebEx at the discretion of the meeting owner.
    2. Griffith University has chosen to end its VIVO membership.  They will continue to use the software to build their Research Hub.  They have chosen to be a Duraspace Silver member and redirect their funds to Dspace.
  2. Review agenda
  3. Thoughts from Jon – based on a range of topics proposed by Mike – please read through and think about what you'd most like to discuss 
    1. Adoption in the community.  How is it going?  What might help?

      1. Recipes for populating a basic VIVO with people, positions, and organizations using Karma
      2. Ready-made ETL of data from common sources — PubMed, ORCID, SHARE — to delay some of the complexities of working with institutional data and have something to demonstrate to stakeholders early on
      3. Separate support for lightweight, early exploration using the above from committed adoption, where need to develop update as well as ingest strategies
        1. Have a task force to help people at that stage to make a plan
        2. Focus more on the harvester and long-term sustainable ingest/update/validation models 
    2. Community participation in development.  What might help?

      1. Recognize and encourage/support even nascent contributors
      2. Promote ways to contribute for non-Java programmers – requirements, testing, ingest and visualization, data sharing
      3. Find a critical mass for a 3-4 day sprint, online or in person
    3. Future of the software:

      1. VIVO Vitro architecture and MVC

      2. Triple stores and their future role – Consider Neo4j and other alternatives?

      3. Visualizations, reports, APIs and other outputs

    4. Connecting VIVOs to each other.  How might this be done? Several use cases:

      1. The UNAVCO/NCAR use case of a VIVO all about datasets that wants to consume other profile information from their researchers' home VIVOs
      2. The dual campus use case – two independent, locally managed VIVOs exposing some or all of each other's data
      3. The UCSF-inspired use case of adding links to co-authors at their home institution
    5. Additional driving use cases for Vitro and VIVO — what should be focus on regarding use cases?

      1. Some feel the VIVO project should focus on it's core use case of a University profile system (Curt Cole's view)
      2. Some feel VIVO and Vitro should cultivate a wide range of use cases as a way to grow the community – international agriculture, climate change, research dataset discovery, linked data for libraries, and small, lightweight tools using Vitro for tasks such as managing and hosting controlled vocabularies
      3. Does it have to be either/or?
    6. Ontologies

      1. OpenRIF and how to work together

      2. Modularization of the ontology.  Benefits, on-going work?
      3. Ontology extraction — generating a subset of VIVO-ISF optimized for the VIVO application.  Has been done manually once, but need to make a reproducible process
      4. Expanded use of “third party” ontologies — prov, skos, bibo, ero — and their relationship to the VIVO ontology. When, why?
      5. Ontology changes requiring upgrades to site triple stores – likely to come sooner vs. later.  When, why, how?

    7. Data

      1. Use of “third party” linked data — GRID, CrossRef, dbpedia, others —

        1. distributing data with VIVO? as we have for geographic entities?

        2. “asking” for data through web calls in real time – what happens when sources are offline?

        3. Should VIVO provide services to buffer external sources?

    8. What would be best for search across VIVOs?
      1. How do we support, extend, and productionize what Dave Eichmann has done with CTSAsearch?
      2. Should we build "search beyond this VIVO" into the app?
    9. Hosting VIVOs.  Concerns, opportunities, likely costs
  4. General discussion
    1. Additional driving use cases for VIVO. Some have argued that the VIVO project should focus on its core use case before it goes out and serves additional use cases. Another perspective is to use VIVO for a range of use cases as we do at Cornell Ithaca: UNAVCO, Linked Data for Libraries. Being flexible allows us to attract additional funding. The challenge is if we're all over the map, no one knows who we are. Maybe it's not a strictly either/or thing.
    2. I think of VIVO as being the back end with a number of different front ends. This is as opposed to trying to be a single purpose system.
    3. the membership tends to focus on VIVO as a single product. When VIVO is adopted, it is a commitment of major campus resources. I also like the broader technology application but it's a little harder to drive membership.
    4. on the representing scholarship side, we were and are the head of the scholarly ecosystem. Acquiring quality data that can be integrated turns out to be a lot of heavy lifting. The community has come to understand VIVO is an enterprise application. The expectation that the data is going to be very good is standard. If you're in it for 4-5 years, and you have appropriate resources and and and... you can produce a quality enterprise presence.
    5. Sandy Payette started this week. We're realizing that we have to keep a better handle on data quality. The absence of good data really undermines the credibility of the resource. Better to have no data than incorrect data. Getting data from Activity Insights, for example, made VIVO a good search engine, but we kept getting negative feedback. We're focusing more on VIVO being an enterprise system. We're not going to add an organization until we know it should have a public presence. The authority control for this will be the library. Our tradition was to be a little looser, but that's going away.... If we could have some well-defined recipes for importing publications. Help people build a lightweight VIVO first - along the lines of what Justin Littman has done at GW. There are two phases for VIVO: the initial demonstration and then the move towards production. One year to 18 months, you begin the work required to convert VIVO into an enterprise system.
    6. the value in data quality is a chicken and egg problem. It's hard to deliver value without data quality, and vise versa. One of the things we use to get over the hump is outreach to end users. We do author disambiguation, but because we have active engagement, they go in every week. Get the researchers passionate about that.
    7. why is this advantageous?
    8. they know how much people are finding out about them through their page. They know people are looking them up on google. That's not true for everyone. There has been a lot of grass roots work where we hand curated profiles for VIPs.... Another thing that helped is that a lot of departments get their data from our system. They pull their publications from us. These admins are motivated to keep things current.  We've gotten the institution invested in making sure there's quality data.
    9. I detest having to populating my profile in 100 places. I refuse to populate my ORCID profile because it doesn't connect with anything. If people can feel VIVO is their primary profile.
    10. agreed. VIVO is supposed to be your institutional profile. That should be the source of data about you. The data from VIVO makes its way into other systems. I'm interested in SciENcv.
    11. that should be done by the summer.  
    12. there are 37 different UCSF websites that pull RDF data from our system.
    13. I think that institutional data is important as a source. If you want this to be the system, there needs to be a huge data store. There should charged with taking care of it.... If your institution doesn't make a commitment, it's too much to take care of.
    14. there is a concept of something other than an institutionally maintained VIVO. Dave E. is not in a position to support non-VIVO profiles. Couldn't there be some over-arching presence of creating VIVO profiles for people?
    15. if we take VIAF, ISNI, and ORCID IDs for any researcher that doesn't have a home VIVO and make it possible to pull in data from any profile, would that have appeal?
    16. that makes sense to me.
    17. I think you can't over-estimate the ease of getting data out of VIVO. One of the reasons we have so many departments is that Anervan created a custom API, which was non-standard but much easier to use. I'm against what he did, but I can't deny the success of that
    18. could we get Research Connection company to harvest from all our VIVOs and then build integrated services for other VIVOs?... I'm not sure Duraspace wants to do this by themselves. RC has a notion of harvesting data from universities. We want to do this on a much more ambitious service, more inline with what Dave is doing. Should we explore this model?
    19. how about Community of Science? They are already harvesting data from a number of institutional websites.
    20. they were very interested at the beginning but that has kind of fallen away.
    21. they were the first vendor to start a VIVO based on their internal data. They did an entire Wash. U. VIVO.
    22.  ORCID doesn't currently allow you to do batch requests for ORCID IDs. None of those details matter to the general public. It's very positive, but in a couple years, there's a question regarding long-term sustainability.... What if we started to frame VIVO for infrastructure required for national reporting?... We need to convert this into grant proposals.... I'm a bit confused about what we're supposed to be doing.
    23. we don't have interaction with the federal government but we're about to do. The SciENcv work is the most important development since the start of the grant.... Research Connection needs data to accomplish their goals. All I did was give them the URL with the RDF triples, and they had U. of Florida on their website, all perfectly and beautifully displayed.
    24. I don't think we want to build any kind of VIVO that doesn't have VIAF, ISNI, and ORCID identifiers. Harvest from PubMed and all these sources, and then pull in data from all the VIVOs. Maybe Research Connection would be interested in this, maybe they wouldn't. The challenge is that not all institutions are ready to do it; it's still early going. There's a tension between what needs to be accomplished at the local institution (i.e. for reporting) vs. what meets the needs of a networked ecosystem. This seems like a fundable idea.
    25. there are three big potentially fundable ideas that could improve the ecosystem
    26. Identifier alignment: find matches between identifiers at scale. NLM might be interested.
    27. Start aligning the world's semantics (OpenRIF, SciENcv). You can know what's happening at certain touchpoints.
    28. Profile provider: allow people to create more primary data about themselves, which they can get and use in some other system.
    29. one way to move this forward is: what if we had a public VIVO where anyone could create a VIVO? We could do some cool stuff with the data. If we could pull this off, it could be impactful.
    30. the idea is that someone should be able to go to a website and click a button that says make me a VIVO profile. 
    31. You would have to have an identifier (ORCID or VIAF), and your records get pulled in.
    32. this is a lot of work
    33. yes but, this opportunity for innovation won't come again. We could present on April 17 at Force16.
    34. Research Connection would be very interested in gathering these data. They don't have partners to speak of. They might be a good partner. They haven't monetized their work.
    35. thanks for your contributions, Jon.

Action Items