Goals

Ability to search across multiple VIVO installations. (Please refine this)

What features are desired for the search? What type of search? What is the goal of the search?

Full text? "semantic"? faceted? other? Complex queries? For people? For publications? (Please refine this)

High level description of approach

Make a index to support the desired types of search and have a web site that facilitates user with querying that index. Keep that index up-to-date.

High level approach to building the index

for each institution:

get a list of all URIs of interest for that institution

...

add the document to the Solr index

High level approach to keeping the index up-to-date.

For each institution:

get a list of URIs that have been modified

...

add the document to the Solr index

Approach alternatives

TODO: what other approaches are there?

Technology Choices

There are some parts of the technology stack that are suggested by the goal of indexing data from VIVO. Using HTTP requests for RDF to gather data from the sites is the most direct approach. Most other options for gathering data from the VIVO sites would need additional coding.

...

In order to scale the process out we were planing to use Hadoop to manage parallel tasks. Many approaches to the problem of indexing linked data from VIVO sites would be embarrassingly paralleled.

Technology Alternatives

We could use a different index software other than Solr. What would that be? A database server with full text capabilities? What are other options? Are there full text search NoSQL options?

...

Serving the web site could be done with just about any system that allows interaction with Solr. The solrsearch javascript libraries would allow any system that serves HTML and js to server this. The options are expansive: httpd, wordpress, movible type, drupal, cold fusion. If the solrsearch javascript can provide almost all of the interactivity on the client side it might be desirable for the server side be as simple as possible. It may even be possible to use static HTML and .js files served by any old web server.

Once an index was created how would it be updated?

Rebuild the whole index?

Get a list of modified individuals from each site and only reindex them?

...

Page tree

Versions Compared

Old Version 3

New Version 4

Key

Goals

What features are desired for the search? What type of search? What is the goal of the search?

High level description of approach

High level approach to building the index

High level approach to keeping the index up-to-date.

Approach alternatives

Technology Choices

Technology Alternatives

Once an index was created how would it be updated?

Page tree

Page History

Versions Compared

Old Version 3

New Version 4

Key

Goals

What features are desired for the search? What type of search? What is the goal of the search?

High level description of approach

High level approach to building the index

High level approach to keeping the index up-to-date.

Approach alternatives

Technology Choices

Technology Alternatives

Once an index was created how would it be updated?