The Fedora Repository makes it possible to design custom event-driven application workflows. For instance, a common task involves sending content to an external search engine or triplestore. Other repositories may wish to generate derivative content such as creating a set of smaller images from a high-resolution master.
Because Fedora publishes modification events on a JMS topic using a local ActiveMQ broker, one can write custom listener applications to handle these various workflows. By default, the repository's JMS broker supports both the OpenWire and STOMP protocols, which means that it is possible to write client listeners (consumers) in a wide variety of languages, including PHP, Python, Ruby and JAVA, among others.
For simple message-consuming applications, writing special-purpose applications may be an excellent choice. In contrast, once a repository begins making use of more complex message-based workflows or when there are multiple listener applications to manage, many repositories use systems such as Apache Camel to simplify the handling of these messages.
Camel makes use of "components" to integrate various services using a terse, domain specific language that can be expressed in JAVA, XML, Scala or Groovy. There exists one such component designed to work specifically with a Fedora4 repository. This makes it possible to model Solr indexing in only a few lines of code like so:
This same logic can also be expressed using the Spring XML extensions:
Or, in Scala:
Please note that the hostnames used for fedora and solr here are entirely arbitrary. It is quite likely that these systems will be deployed on separate hosts and that the Camel routes will be deployed on yet another host. Camel makes it easy to distribute applications and replicate data asynchronously across multiple hosts
By default, Fedora publishes events to a
topic on a local broker. This topic is named "fedora". Each message will contain an empty body and up to five different header values. Those header values are namespaced so they look like this:
properties are comma-delimited lists of events or properties. The eventTypes follow the JCR 2.0 specification and include:
properties field will list the RDF properties that changed with that event.
NODE_REMOVED events contain no properties. The fcrepo component for Camel is configured to recognize these headers and act appropriately.
fcr:transform program has been installed as
mytransform, you can generate a JSON representation of an object and send it to a low-latency, highly available document store, such as Riak. The following route determines if an object has been removed or simply added/updated. It then routes the message appropriately to a load-balancer sitting in front of the Riak HTTP endpoint.
Some additional processing must be done to transform an
application/n-triples response into a valid
application/sparql-update payload before sending to Fuseki or Sesame. The fcrepo component contains some processors in
org.fcrepo.camel.processor to handle this case.
The default configuration is fine for locally-deployed listeners, but it can be problematic in a distributed context. For instance, if the listener is restarted while a message is sent to the topic, that message may be missed. Furthermore, if there is a networking hiccup between fedora's local broker and the remote listener, that too can result in lost messages. Instead, in this case, a queue may be better suited.
ActiveMQ brokers support a wide variety of protocols. If Fedora's internal broker is bridged to an external broker, please remember to enable the proper protocols on the remote broker. This can be done like so:
Each transportConnector supports many additional options that can be added to this configuration.
Camel routes can be deployed in any JVM container. In order to deploy to Jetty or Tomcat, the route must be built as a WAR file. This command will get you started:
After the project has been built (
mvn install), you will find the WAR file in
./target. That file can simply be copied to the
webapps directory of your Jetty/Tomcat server.
Another popular deployment option is Karaf, which is a light-weight OSGi-based JVM container. Karaf has the advantage of supporting hot code swapping, which allows you to make sure that your routes are always running. It also allows you to deploy XML-based routes (Spring or Blueprint) by simply copying the files into a
Karaf can be set up by:
- downloading Karaf from an apache.org mirror
- running ./bin/karaf to enter the shell
installing required services:
setting up a service wrapper (so that karaf is always running)
- following the directions provided by this command
Now, routes can be deployed (and re-deployed) by simply copying JAR files or XML documents to