Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The indexer can have any number of workers can be configured to process the events.  So the main indexer process retrieves the object RDF from the repository, and that content can be reused by multiple workers.  If you want to process the events in several ways (triplestore, Solr, archive to disk, update remote repository, etc.), this limits the number of times the metadata has to be retrieved from the repository to once each time the object is updated.

...

The indexer is configured using Spring.  Here is a sample configuration fragment showing two workers and the framework for listening to events and connecting them with the workers:

No Format
  <!-- Worker #1: Copy object RDF to a Fuseki triplestore using SPARQL Update -->
  <bean id="sparqlUpdate" class="org.fcrepo.indexer.SparqlIndexer">
    <!-- base URL for triplestore subjects, PID will be appended -->
    <property name="prefix" value="http://localhost:${test.port:8080}/rest/objects/"/>
    <property name="queryBase" value="http://localhost:3030/test/query"/>
    <property name="updateBase" value="http://localhost:3030/test/update"/>
    <property name="formUpdates">
      <value type="java.lang.Boolean"

...

>false</value>
    </property>
  </bean>

  <!-- Worker #2: Save object RDF to timestamped files on disk -->
  <bean id="fileSerializer" class="org.fcrepo.indexer.FileSerializer">
    <property name="path" value="./target/test-classes/fileSerializer/"/>
  </bean>

  <!-- Main indexer class that processes events, gets RDF from the repository and calls the workers -->
  <bean id="indexerGroup" class="org.fcrepo.indexer.IndexerGroup">
    <property name="repositoryURL" value="http://localhost:${test.port:8080}/rest/objects/" />
    <property name="indexers">
      <set>
        <ref bean="fileSerializer"/>
        <ref bean="sparqlUpdate"/>
      </set>
    </property>
  </bean>

  <!-- ActiveMQ queue to listen for events -->
  <bean id="destination" class="org.apache.activemq.command.ActiveMQTopic">
    <constructor-arg value="fedora" />
  </bean>

  <!-- Message listener container to connect the JMS queue to the indexer -->
  <bean id="jmsContainer" class="org.springframework.jms.listener.DefaultMessageListenerContainer">
    <property name="connectionFactory" ref="connectionFactory"/>
    <property name="destination" ref="destination"/>
    <property name="messageListener" ref="indexerGroup" />
    <property name="sessionTransacted" value="true"/>
  </bean>

Extending the Indexer

To implement a new kind of indexer:

  1. Implement the indexing functionality using the org.fcrepo.indexer.Indexer interface, which consists of only two methods (one to handle new/updated records, and another to handle deleted records).  Any configuration required should be done using Java bean setter methods.
  2. Update the Spring configuration and to add a bean referencing the new class and providing the configuration properties needed.
  3. Add the bean to the list of workers invoked by the indexer.

...

The easiest way to get hands-on experience with the indexer and see updates synced with an external triplestore is to use the fuseki branch of the kitchen sink projectThis is a version of the The kitchen sink offers a Fedora4 repository with the indexer pre-configured to sync to a Fuseki triplestore.  To set this up, first download and run the Fuseki triplestore.  Then build and run the pre-configured Fedora4:

$ git clone https://github.com/futures/fcrepo-kitchen-sink.git
  
$ cd fcrepo-kitchen-sink
$ git checkout fuseki
$ MAVEN_OPTS
= "-Xmx1024m -XX:MaxPermSize=1024m" mvn install $ MAVEN_OPTS = "-Xmx512m" mvn jetty:run

Using the default settings, Fedora4 will be running at http://localhost:8080/rest/ – you can create, update and delete objects and datastreams using your browser.  Each event will trigger the indexer and be synced to Fuseki, which you can access at http://localhost:3030/.