Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Built software to create a Solr index
  2. Built webapp over the index
  3. Harvesting was never brought to production state
  4. Initial plans for Hadoop
    • They did not have the Hadoop experience
  5. Went was with Scala
    • Have some regrets with Scala implementation, especially the Actor framework
    • When resuming workIf could do it again, would go in the direction of Hadoop
  6. Development options
    • Try to continue with the Scala-Actor code (not recommended)
    • Or, take the useful parts (Java code) and move towards Hadoop

Approach

...

System parts

  1. Drupal website
  2. Solr index
    • Vanilla Tomcat with Solr
  3. Solr is currently populated with Scala-Actor code
  4. Components interact via HTTP
  5. Drupal frontend is a bit unknown
    • Others (Nick Cappadona, Miles Worthington) worked on it; not on the project but still available to consult
    • Was considered beta
    • Probably in Drupal 6
    • Can not stretch it the UI to gracefully handle 200 schools
    • Does follow responsive design principles, but predates ready-made libraries such as Bootstrap/Sass
  6. Earlier conversations around using a js-solr library
    • Do not recall the name of the library
  7. Question: is current state in production state or beta?
    • Mostly beta, except Solr index
    • No reason to stick with Drupal, except that it exists
    • Indexing backend is considered a deadend
  8. Indexing approach will be similar to that used for a Cornell Library project building a new catalog search interfaceRDF cataloging project
    • Take MARC records in catalog, running them through RDF
  9. Should we be considering Nutch?
    • We need to consider how much load we put on institution VIVOs in indexing
  10. Should we do complete harvests every time or develop an incremental harvesting capability
    • Is not trivial to identify what has changed since one VIVO "page" brings in data from sometimes hundreds of related entities, including other people, publications, events, etc.

Scenarios

  1. Replicate VIVOSearch.org in DuraSpace infrastructure
  2. Some institutions would like to pick up app an and run locally, or might run their own Amazon instances, or might want to take advantage of DuraSpace services for hosting a private index and search landing site
  3. Key selling point: agnostic to ??to the software that produces the RDF
    1. Currently proven to work with VIVO and Harvard Profiles
    2. Elsevier Scival also claims compatibility – Northwestern University will be the test for that
    3. but can participate through simply putting VIVO-compatible RDF in a web-accessible directory
  4. Would like to validate user RDF in the future
    1. likely an on-boarding process – once we have indexed a site once, it's likely to go smoothly thereafter

Demo

  1. Performed brief demo
  2. UI has gone through fair amount of rigor

Pilot group

  1. CTSAs (the ~60 NIH-funded Clinical and Translational Science Awards)
    • They are committed by majority vote of the principal investigators to doing researcher networking and search using the VIVO ontology
    • They (or a subset ready to act soon)
    CPSA
    • They could serve as a good pilot group
  2. Existing partners may be willing to participate without extensive preconditions or delay
    • VIVO
      • UF
    • Cornell
    • Colorado
    • Duke
    • Brown
      • , Cornell, Weill Cornell, Colorado, Duke, Brown, Stony Brook, Indiana, Scripps, USDA, APA, and likely several others
    • Harvard Profiles
      • Harvard, UCSF, and likely a couple others (Wake Forest?)
    • Scival Experts
      • Northwestern, Oregon Health & Science University
    • Loki
      • Iowa
    • Digital Vita
      • Pittsburgh
    • Others
      • Toronto, UCLA
  3. CTSAs will want a process to CPSA could specify requirements/needs
    • Could turn into a bureaucratic/somewhat political log-jamprocess
  4. It may be more practical working with institutions with which we have existing relationships

...