Goals

Ability to search across multiple VIVO installations. (Please refine this)

Search
  1. What features are desired for the search?
  2. What type of search? 
  3. What is the goal of the search?
    1. Full text?  - yes
    2. "semantic"?  - future
    3. faceted?  - yes
    4. Complex queries? - future
    5. For people? - yes
    6. For publications, organizations, etc? - yes, but needs futher refinement

Approaches

Make a index to support the desired types of search and have a web site that facilitates user with querying that index. Keep that index up-to-date.

Approach to building the index
  1. For each institution
    1. Get a list of all URIs of interest for that institution
  2. For each URI
    1. Get the linked data RDF for the URI
    2. Build a document using that data
    3. Add the document to the Solr index

Notes

Approach to keeping the index up-to-date
  1. For each institution
    1. Get a list of URIs that have been modified
  2. For each URI
    1. Calculate what individuals are affected by this modification
    2. Add to update list
  3. For each URI in update list
    1. Get the linked data RDF for the URI
    2. Build a document using that data
    3. Add the document to the Solr index

Notes

Alternatives Approaches

 TODO: what other approaches are there?

Technology Choices
  1. There are some parts of the technology stack that are suggested by the goal of indexing data from VIVO. 
  2. In general we would go with Solr for the search index because of we have experience with it, because of its documentation, because of it distributed features and because it is mature.
  3. As of 2012 vivosearch.org uses Drupal and solrsearch javascript libraries. 
  4. In order to scale the process out we were planing to use Hadoop to manage parallel tasks. 

Notes

Technology Alternatives

  1. We could use a different index software other than Solr. 
  2. What the the alternatives to Hadoop?  
  3. Serving the web site could be done with just about any system that allows interaction with Solr.  
Index Updates
  1. Once an index was created how would it be updated?
    1. Rebuild the whole index?
    2. Get a list of modified individuals from each site and only reindex them?