You are viewing an old version of this page. View the current version.
Compare with Current
View Page History
« Previous
Version 2
Next »
Attendees
Jonathan Markow
Andrew Woods
Jonathan Corson-Rikert
Brian Caruso
Brian Lowe
Jim Blake
Agenda
- Understand how we move forward
- Important for new sponsorship
- Options
- Are we interested in hosting?
- Are we interested in development?
Discussion
- LinkedIndexBuilder
- Crawls data over VIVO instances
DuraSpace
- We see cool opportunities for VIVOSearch
- Division of labor
- Existing VIVO Search team do development VIVO side
- Duraspace will host and do system adminstration
- Longer term
- Interested in greater inter-operation
- View collaboration as team effort
VIVOSearch technology
- Built software to create a Solr index
- Built webapp over the index
- Harvesting was never brought to production state
- Initial plans for Hadoop
- They did not have the Hadoop experience
- Went with Scala
- Have some regrets with Scala implementation, especially the Actor framework
- When resuming work, would go in the direction of Hadoop
- Development options
- Try to continue with the Scala-Actor code (not recommended)
- Or, take the useful parts (Java code) and move towards Hadoop
Approach
- Take a high-level, broad view
- Goal: have a VIVOSearch app for range of institutions
- Come up with comprehensive list of questions
- What is the optimal platform?
- What should the production app should do?
- How many institutions?
- What do we need to run it?
- What do we need to support it?
- Once we start to answer questions
- We can start to come up with tasks
System parts
- Drupal website
- Solr index
- Solr is currently populated with Scala-Actor code
- Components interact via HTTP
- Drupal frontend is a bit unknown
- Others (Nick Cappadona, Miles Worthington) worked on it; not on the project but still available to consult
- Was considered beta
- Probably in Drupal 6
- Can not stretch the UI to gracefully handle 200 schools
- Does follow responsive design principles, but predates ready-made libraries such as Bootstrap/Sass
- Earlier conversations around using a js-solr library
- Do not recall the name of the library
- Question: is current state in production state or beta?
- Mostly beta, except Solr index
- No reason to stick with Drupal, except that it exists
- Indexing backend is considered a deadend
- Indexing approach will be similar to that used for a Cornell Library project building a new catalog search interface
- Take MARC records in catalog, running them through RDF
- Should we be considering Nutch?
- We need to consider how much load we put on institution VIVOs in indexing
- Should we do complete harvests every time or develop an incremental harvesting capability
- Is not trivial to identify what has changed since one VIVO "page" brings in data from sometimes hundreds of related entities, including other people, publications, events, etc.
Scenarios
- Replicate VIVOSearch.org in DuraSpace infrastructure
- Some institutions would like to pick up app and run locally, or might run their own Amazon instances, or might want to take advantage of DuraSpace services for hosting a private index and search landing site
- Key selling point: agnostic to the software that produces the RDF
- Currently proven to work with VIVO and Harvard Profiles
- Elsevier Scival also claims compatibility – Northwestern University will be the test for that
- but can participate through simply putting VIVO-compatible RDF in a web-accessible directory
- Would like to validate user RDF in the future
- likely an on-boarding process – once we have indexed a site once, it's likely to go smoothly thereafter
Demo
- Performed brief demo
- UI has gone through fair amount of rigor
Pilot group
- CTSAs (the ~60 NIH-funded Clinical and Translational Science Awards)
- They are committed by majority vote of the principal investigators to doing researcher networking and search using the VIVO ontology
- They (or a subset ready to act soon) could serve as a good pilot group
- Existing partners may be willing to participate without extensive preconditions or delay
- VIVO
- UF, Cornell, Weill Cornell, Colorado, Duke, Brown, Stony Brook, Indiana, Scripps, USDA, APA, and likely several others
- Harvard Profiles
- Harvard, UCSF, and likely a couple others (Wake Forest?)
- Scival Experts
- Northwestern, Oregon Health & Science University
- Loki
- Digital Vita
- Others
- CTSAs will want a process to specify requirements/needs
- Could turn into a bureaucratic/somewhat political process
- It may be more practical working with institutions with which we have existing relationships
Key to success: no client effort
- How much work needs to happen on the client's side
- Indexing is via linked-data requests to VIVO sites
- There is no client work required
- Valid RDF?
- VIVO sites do not always provide valid RDF
- Potentially some parts of the graph can be ignored
- Client may or may not need to clean up their RDF
- Goal: require no work on client side
- Some schools do not have VIVO, but want to create "RDF export tool"
- 1 or 2 schools fall in this category
- Toronto
- Running "locally"
- May mean running in AWS for a specific institution
- May mean running on local servers
Business proposition
- Bringing in more institutions to increase researcher visibility
- Make app/search internationally available
Scala
- Only one developer who knows it
- Actor library is no fun
- Code that does RDF to Solr document is reused Java code
- Scala was supposed to help with multi-threaded processing
- Not a lot of cooperation between processes
- Errors tend to be cryptic
- Existing code was developed quickly, not for quality
Next Steps
- Several moving pieces
- Document questions that need to be answered
- Document issues
- Document activities
- Together, this will give a clearer idea of the scope of the project
- Need to also determine how to turn it into a viable "business"
- Some discussions are already happening in the wiki
- Suggestion to put documents in wiki
- Leverage JIRA?
- Create wiki space for VIVOSearch
- Use VIVO crowd permissions
- Grant admin rights to:? j2blake
- Next call: week after next
- Need to get a rough notion of timeframe and institution cost
- Would like to have something at VIVO conference (Aug)
- 12-1pm Tues Jan 29th
- Jon to send out Web-Ex invite