November 6, 2015, 1 PM EST

Attendees

Steering Group Members

(star)= note taker

Ex officio

 debra hanken kurtzGraham TriggsMike Conlon

Regrets

  Kristi HolmesBart Ragon (3 of 3), Jonathan Markow, Melissa Haendel, Paul Albert, Robert H. McDonald

Dial-In Number:  

DIAL-IN: 641-715-3650, Participant code: 117433#  Local country dial-in codes

Agenda

 
Item
Time
Facilitator
Notes
1Updates5 minAllSee below.
2Review agenda2 minAllRevise, reorder if needed
3Update re Recent Meetings10 minDean 
4Update re Recent Meetings10 minEricORCiD in SF
5How does VIVO get bigger30 minMike, AllHow does VIVO end up with more sites in production.
6Future topics3 minAll

Harvard Profiles (Nov 20); webinar series; training program

Notes

  1. Updates
    1. SHARE VIVO project nearing completion end of November.  SHARE can now harvest data from a VIVO (given a  local username and password for the API).  VIVO will soon be able to harvest data from SHARE.  Uses VIVO SPARQL API.  Joint webinar planned for spring.
    2. 1.8.1 to be released Tuesday, November 10.  See VIVO v1.8.1 Release Notes
    3. Benchmarking discussion at last week's Implementation and Development call.  Notes/thoughts available here:  https://goo.gl/b0N8C2
    4. Tentative date for launch of new web site is Tuesday, November 17.
    5. Mike out November 13
    6. Site registry complete and in place at http://duraspace.org/registry/vivo. Kristi Searle of Duraspace handles requests to update info.  As they are received, Kristi asks the site to fill out a short survey with all the questions and then enters the data for them. Several sites have been added and several other sites have improved their information using this process this week.
    7. Discussion Friday with Andrew Woods, Fedora, Rick Johnson, Notre Dame, Violeta Ilik and Alex Viggio re Fedora/VIVO/Hydra
    8. Apps and Tools improved work group pages – archived old notes, improved presentation of current material. https://goo.gl/bgnGT9
    9. 1.9 features discussion with the Outreach and Engagement Working Group November 10.
  2. Review Agenda
  3. Update re Recent Meetings – Dean
    1. Linked Data for (metadata) Production partners' meeting – a group of libraries working on cataloging practices and workflows to produce linked data; related to but distinct from the second-generation Linked Data for Libraries
      1. Proposals include using both Vitro and eagle-i for editors, and at least at Cornell we're looking at using VIVO as a local authority
      2. Hoping for funding in April
    2. Poster on LD4L at the International Semantic Web Conference
      1. Very computer-science and theoretical
      2. Very European attendance
      3. Most interesting application was harvesting and relating data from escort ads to track human trafficking
        1. Using Elasticsearch rather than a triplestore to pull information together for fast lookup
    3. Ivies Plus Library IT heads meeting
      1. Cornell has decided to go with Kuali OLE as our integrated library system, in large part because it allows us to innovate in using linked data, following a path UC Davis has demonstrated
      2. What is the landscape for states in which there are large statewide catalogs – e.g., California
        1. UC Davis has the system to handle circulation and local inventory management – the search part gets separated; have gone that path at Cornell; California has an integrated library search
        2. The University of Florida runs a statewide library catalog for the whole system
        3. Indiana is a statewide library service as well
    4. And at the Digital Library Federation in Vancouver – again speaking on LD4L, with considerable interest in linked data
    5. Good news announcement about Sandy Payette joining the Cornell Library team and taking the lead on a lot of our VIVO work
  4. Update re Recent Meetings – Eric
    1. Attended the ORCID meeting in San Francisco earlier this week – saw Simeon Warner
    2. Went to find out their position (competitive or collaborative) on research networking systems
      1. Heard ORCID profile as many times if not more than ORCID iD – while still saying had no interest in being a profile system
      2. The explanation they have is that they are very much in the business of holding information that is owned by the individual researcher, vs. the institutional provenance of information carried in research networking system - Eric was asked to be on a panel along with a representative from Stanford CAP
        1. They can carry it with them, or may not be affiliated with any institution at any given time
        2. Have a UI for data entry and storage only to enable researchers to enter and maintain data
        3. Don't want to get involved in the semantics of the data – don't want to get caught up in issues of representation of data
        4. Claim they understand there is a lot of prior art in public-facing research networking systems
      3. They also recognize the issue of the individual assertion vs. institutional provenance, and that it means there will be different information maintained in each
        1. If there's bad data in an ORCID profile, it's the responsibility of the researcher – a lower risk than the institutional approach
      4. Seems very complementary vs. competitive
      5. Boston University has set up their system with two-way synchronization with ORCID – an example of ORCID being the plumbing behind a public-facing system on the University website
    3. SHARE is content to receive a whole corpus of information from an institution, while ORCID is really not interested in having an institution push data to VIVO as an authoritative source
    4. At this point ORCID is actively discouraging institutions from pushing large amounts of data in for multiple people, because they find the information goes stale and researchers forget they have an ORCID iD and often create another one
      1. A philosophical position on ORCID's part to say this information belongs to the researcher
      1. That position is quite limiting – it will be difficult to get certain kinds of data, since no incentive for the investigator to provide it
      2. Isn't it also that they want primarily the information to disambiguate somebody
      3. Not what they explicitly talked about, but overall less emphasis on the value of the iD for disambiguation than on the researcher ownership
      4. The publishers are asking for ORCID iDs and beginning to send metadata around the world that embeds ORCID iDs
        1. That's another incentive for researchers – to see the ORCID iD appearing in many places
        2. It should be automated so the researcher only needs to be minimally aware of their ORCID iD
    5. Did anything come of the codefest? Had to leave before it so not sure – looked smallish
    6. How is the Board of ORCID composed now? was convened and founded and core technology provided by publishers
      1. Publishers are still definitely involved and saw PlosOne, Symplectic, Thomson Reuters, Elsevier, etc. there
  5. How does VIVO get bigger?
    1. How does VIVO get more production sites, going beyond pilot projects
    2. We want to understand why some people decide not to adopt at all and why some projects never finish and launch
    3. We have 25-28 sites that are in production, many of whom have been for a long time, and a number in various stages of implementation, some of which are on hold or have been abandoned (e.g., we recently learned Stony Brook has discontinued their VIVO project)
    4. What can be done or should be done so that VIVO can become something universities can actually do
    5. People ask us at UCSF about differences between Profiles and VIVO
      1. People are intimidated by the amount of effort it takes to get data into the system to the point it makes a good-looking, viable system
      2. So any efforts to streamline would help
      3. Profiles is easier since it has 3 tables, and goes and gets publications, but the downside is that it's very limiting
      4. UCLA will put VIVO at the tail end because they need the flexibility to go beyond biomedicine
    6. A lot of VIVO sites are partnering with Symplectic because it provides an enterprise quality data management solution that facilitates production of the VIVO as a public website
      1. But there's an edge on that, namely that attraction that Symplectic can be considered a secure system – administrators think less about giving data away, which VIVO implies
      2. Met here with college-level people who do faculty reporting systems, and the conversation has changed dramatically once the conversation has become Symplectic + VIVO
        1. provides a way that can handle private data and reporting out on that
    7. Often heard that Profiles has easier data acquisition (from PubMed) as well as more visualization options and the passive networking tools
      1. If demo both, people tend to lean toward Profiles, especially for medical schools
      2. Moving to Javascript and D3 for visualizations will make it easier for the community to join in and help
    8. Profiles has been more concerned about the display vs. the underlying data
      1. It would help not to require all the data to be store all the time as triples – e.g., leveraging cached data in relational tables
      2. RPI has been working with Elasticsearch as an alternative to Solr as an index
        1. Has been advised to look at a new tool called Elk, a tool that builds a dashboard utility for analytics on top of Elasticsearch
    9. There are ways to make the product better
      1. If we could provide a three tables means of ingest, that would help – something we write, or via Karma models
        1. and we can still offer the ultimate flexibility that places like UCLA want
      2. the visualization opportunities are also attractive
      3. But there's a larger strategic question of whether the universities understand their enterprise space and understand where VIVO fits
        1. vs. Academic Analytics, research administration, CRIS, etc.
    10. Did people come to Symplectic saying what their needs were
      1. Profile is seen as a nice web presence for their researchers, and is pretty much a dead end for the data
      2. Administrative functions have operational systems – payroll, library catalogs, etc. – and VIVO is not an operational system
        1. Many administrators struggle to understand the value of a system that doesn't do transactions such as managing grants or course registration
        2. Or they want a two-for-one deal – "really, what you're doing is collecting research outputs, and we're the research system and we have the outputs"
          1. when they don't really have the outputs or realize they need more than what they have
        3. They don't want another system – competes for budget
        4. Would rather add functionality to existing systems
      3. Are people asking for the data for analytics, or are they more interested in the pages
      4. There's also the aspect of the VIVO ontology – they are finally getting around to having the CAP system support VIVO linked open data
      5. Scripps is doing work to re-use their VIVO data for researchers to put a full list of publications in a .gov namespace
    11. Do we want to push VIVO at the research group or project level vs. the whole university
      1. There are a number of sub-institutional VIVOs that are successful
      2. And virtual organizations – Deep Carbon Observatory
    12. Are there working examples of multi-site operations that are passing RDF as a part of their normal business
      1. If we pull in the linked data fragments group we might find some examples
        1. Going beyond the ecoverse approach with lightweight discovery
      2. Cornell is setting up a test server to pull data on publications from Cornell into an NCAR or UNAVCO VIVO, where they have information about a researcher's project activities and published datasets
        1. We are close to being able to do things like that technically, but would be helpful to have more of a model and not have to invent it
        2. If you could pull in GRID data about the basic facts of an institution, or journals, and maintain it so every university doesn't have to maintain it
        3. People could understand that – the idea of distributed information
        4. This was the approach of the American Psychological Association VIVO – they would be the source of information on articles they have published but would look to university VIVOs as the source of information on employment and teaching; funding agencies would own and publish information on grants and link to both publisher and institutional VIVOs
  6. Future topics

Action Items