Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents

Panel
Excerpt

VIVO and Solr are two distinct web applications that act as one. Solr gives users the ability to search the VIVO data. VIVO also uses Solr for some of its internal data retrieval.

...

The relationship between VIVO and Solr

What is Solr?

...

  • maintains its own index
  • exists as a web application
  • send it requests to
    • search
    • add, update, or delete records

...

  • Example Solr runs in Jetty

...

Solr is an open-source, enterprise level search platform, available from Apache. It is based on the popular Lucene search engine. VIVO uses a standard instance of Solr, without modification. You can learn more about Solr at the Apache Solr home page.

Solr maintains its own index of data, which reflects the contents of the VIVO triple-store. As the data in VIVO changes, the Solr index must change also. In most cases this happens automatically, but not always. See the section below called "How is the index kept up to date" for more information.

Solr is a self-contained web application, separate from VIVO. At most VIVO sites, Solr and VIVO run on the same machine, in the same instance of Tomcat. This is not the only possible configuration, however, and it is possible to put Solr in a different servlet container or even on a different computer from VIVO.

In a typical VIVO installation, Solr is hidden behind VIVO, and the users cannot access it directly. In general, they don't know that Solr exists as an application.

...

How does VIVO use Solr?

VIVO uses the Solr search engine in two ways:

  • as a service to the end user,
  • as a tool within the structure of the application.

Solr for the end user.

Like many web sites, VIVO includes a search box on every page. The person using VIVO can type a search term, and see the results. This search is conducted by Solr, and the results are formatted and displayed by VIVO.

...

Solr allows for a "faceted" search, and VIVO displays the facets on the right side of the results page. These allow the user to filter the search results, showing only entries for people, or for organizations, etc.

Solr within VIVO

VIVO is based around an RDF triple-store, which holds all of its data. However, there are some tasks that a search engine can do much more quickly than a triple-store. Some of the fields in the Solr search index were put there specifically to help with these tasks.

For example, the browse area on the home page shows how many individuals VIVO holds for each class group.

VIVO could produce this data by issuing a SPARQL query against its data model. However, this would take several seconds for a large site, and we do not want the user to wait that long to see the home page. To avoid this delay, the class group of each individual is stored in the Solr record for that individual. Solr can count these fields very quickly, so VIVO issues a Solr query against the index, and displays the results on the home page.

Record counts on VIVO's index pages are obtained using the same type of Solr query.

How is the index kept up to date?

Note

In progress

 

  • When an individual is added/edited/deleted, Solr is given the new information and updates the index.
  • Sometimes the index must be rebuilt
    • Most commonly, after an ingest, since some of the ingest mechanisms bypass the usual VIVO framework
      • It would be too slow to update the Solr index on each new statement from the ingest
      • Working to add a search-aware ingest method, which Harvester or other tools could use.
       
    • A rebuild is done on the side, then replaces the previous index, and Solr switches to the rebuilt one.

    • send it requests to
      • search
      • add, update, or delete records

How is Solr created and configured?

Note

In progress

  • The Solr home directory
    • What is in it?
    • How does Solr find it?
  • How is it built?
    • build script - Tomcat or otherwise.

How does VIVO contact Solr?

Note

In progress

  • Need to tell VIVO how to contact Solr
    • Authorization tests, now obsolete
  • VIVO may start before Solr does. Usually does.

Signs of a possible Solr-related problem

  • Smoke tests
    • Immediate failures
    • Separate thread, since Solr may start after VIVO
      • Go to the status page. Do you see a successful completion?
        • If not, wait (how long)?
  • No content
    Confirm that this is a Solr problem by navigating through VIVO and finding content.
  • Others?

Is Solr working properly?

  • Check the admin console.
  • See the fields
  • See the contents?
  • Look in the Solr log

Does it help to rebuild?

  • Really clean
    • Solr home directory
    • Tomcat/webapps, Tomcat/work, Tomcat/conf
    • ant clean deploy (or ant all)

Is the communication working?

  • Check the VIVO log?
  • Check the deploy.properties (both in the log and in the file)

...

Children Display
alltrue
excerpttrue

...