...
- Built software to create a Solr index
- Built webapp over the index
- Harvesting was never brought to production state
- Initial plans for Hadoop
- They did not have the Hadoop experience
- Went was with Scala
- Have some regrets with Scala implementation, especially the Actor framework
- When resuming workIf could do it again, would go in the direction of Hadoop
- Development options
- Try to continue with the Scala-Actor code (not recommended)
- Or, take the useful parts (Java code) and move towards Hadoop
Approach
...
System parts
- Drupal website
- Solr index
- Solr is currently populated with Scala-Actor code
- Components interact via HTTP
- Drupal frontend is a bit unknown
- Others (Nick Cappadona, Miles Worthington) worked on it; not on the project but still available to consult
- Was considered beta
- Probably in Drupal 6
- Can not stretch it the UI to gracefully handle 200 schools
- Does follow responsive design principles, but predates ready-made libraries such as Bootstrap/Sass
- Earlier conversations around using a js-solr library
- Do not recall the name of the library
- Question: is current state in production state or beta?
- Mostly beta, except Solr index
- No reason to stick with Drupal, except that it exists
- Indexing backend is considered a deadend
- Indexing approach will be similar to that used for a Cornell Library project building a new catalog search interfaceRDF cataloging project
- Take MARC records in catalog, running them through RDF
- Should we be considering Nutch?
- We need to consider how much load we put on institution VIVOs in indexing
- Should we do complete harvests every time or develop an incremental harvesting capability
- Is not trivial to identify what has changed since one VIVO "page" brings in data from sometimes hundreds of related entities, including other people, publications, events, etc.
Scenarios
- Replicate VIVOSearch.org in DuraSpace infrastructure
- Some institutions would like to pick up app an and run locally, or might run their own Amazon instances, or might want to take advantage of DuraSpace services for hosting a private index and search landing site
- Key selling point: agnostic to ??to the software that produces the RDF
- Currently proven to work with VIVO and Harvard Profiles
- Elsevier Scival also claims compatibility – Northwestern University will be the test for that
- but can participate through simply putting VIVO-compatible RDF in a web-accessible directory
- Would like to validate user RDF in the future
- likely an on-boarding process – once we have indexed a site once, it's likely to go smoothly thereafter
Demo
- Performed brief demo
- UI has gone through fair amount of rigor
Pilot group
- CTSAs (the ~60 NIH-funded Clinical and Translational Science Awards)
- They are committed by majority vote of the principal investigators to doing researcher networking and search using the VIVO ontology
- They (or a subset ready to act soon)
CPSA- They could serve as a good pilot group
- Existing partners may be willing to participate without extensive preconditions or delay
- VIVO
- Cornell
- Colorado
- Duke
- Brown
- , Cornell, Weill Cornell, Colorado, Duke, Brown, Stony Brook, Indiana, Scripps, USDA, APA, and likely several others
- Harvard Profiles
- Harvard, UCSF, and likely a couple others (Wake Forest?)
- Scival Experts
- Northwestern, Oregon Health & Science University
- Loki
- Digital Vita
- Others
- CTSAs will want a process to CPSA could specify requirements/needs
- Could turn into a bureaucratic/somewhat political log-jamprocess
- It may be more practical working with institutions with which we have existing relationships
...
{"serverDuration": 78, "requestCorrelationId": "9a984b9301ae9b2f"}