Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Note
titleWork in progress

Under construction, please come back soon!

JMS-based messaging for the Fedora Commons repository which is based on sending messages notifying consumers that content items within a Fedora Repository have been modified, identifying with URIs the item that has changed and the location of the modified content.

Important Note: This is pre-alpha code published to encourage discussion on this topic to work towards a consensual agreement on a generic messaging and Fedora implementation pattern suitable for all kinds of potential consumers of Fedora messages. Particularly see the list of known issues.

Rationale

Fedora includes JMS-based messaging capabilities. Messages are dispatched on every Fedora API-M method invocation, and contain details of the method name and parameters. Messages are therefore closely coupled to Fedora's API-M.

There exists a class of potential consumers of Fedora messages with responsibilities for indexing or otherwise manipulating and persisting the content within Fedora. These clients in general are interested in knowing when content within Fedora changes, and what those changes are.

These consumers currently have to "decode" the API-M messages received via JMS to identify what the message means to them in terms of modifications to the content within Fedora.

Hence Fedora Content Change Messaging. The messages are designed to be deliberately agnostic to Fedora's API, and instead represent, using a generic message format (encapsulated within AtomPub), additions, deletions and modifications to content within Fedora; identifying the item using an HTTP URI, and providing a "callback" HTTP URI from which the new or modified content can be accessed.

Installing and using

The source code is hosted on Github. See the README file for instructions on installing and using.

Consumers of messages can make use of the following classes:
net.acuityunlimited.fedora.messaging.AtomContentChangeMessage which can be used to deserialize the AtomPub message to net.acuityunlimited.gen.messaging.ContentObjectModification.

Code overview

This is a module that utilises Fedora's decorator pattern, in the same way as the current JMS messaging module. It can either be used alongside the existing messaging plug-in, or as an alternative to it.

API-M methods are handed off to an implementation of org.fcrepo.server.management.Management which constructs the appropriate message, and then passes the method down the decorator chain.

(In the future other architectural patterns might be considered, eg there may be more appropriate places to hook in notification with High-Level Storage.)

The Messages

A message represents a concurrent set of changes to a Fedora digital object. Changes may be either modifications of content (create, update, delete) or modifications of state.

In essence it encapsulates one or more datastream updates and/or state changes for a single object, corresponding to a single API-M method invocation.

For example, for an ingest operation, it represents a set of Add operations for each datastream created on ingest; for a modifyDatastream operation it represents an Update for the datastream if the API-M method parameters include updated content.

Messages are dispatched based on modifications to the latest versions of datastreams; it is assumed that consumers are only interested in the current state. So for example if the latest version of a datastream is purged, this is represented as a modification, with the new content being the now-latest version (the version preceding the one being purged).

Data model

A ContentObjectModification is used to encapsulate a set of ContentItemModifications

A ContentObjectModification represents:

  • The User making the change
  • The date/time of the change
  • The URI identifier of the Fedora digital object (the REST API endpoint of getObjectProfile)
  • The URI identifier of the parent Fedora digital object (not implemented - see Issues)
  • The type(s) of the Fedora digital object (not implemented - see Issues)
  • A textual description of the modification (used for information/debugging only)
  • A set of ContentItemModifications

Each ContentItemModification represents:

  • The URI identifier of the datastream (the REST API endpoint of getDatastreamDissemination)
  • A URL for the modified content location (the datastream version) - present for Add and Update operations, not present for Delete operations and state changes
  • The internet media type (MIME type) of the content - present for Add and Update operations, not present for Delete operations and state changes
  • The type of modification (Add, Update, Delete) - not present for state-only changes
  • Wiki Markup
    The state change of the datastream, specifying the previous and new states \[1\]
    • For Add operations the previous state is not specified

Wiki Markup
\[1\] So the consumer of the messages doesn't have to track the state of individual items but can for instance remove an item from an index if the state has changed from Active to Deleted or Inactive, but can ignore a state change from Inactive to Deleted (but see Issues on modelling state)

The above are represented as:

  • net.acuityunlimited.gen.messaging.ContentObjectModification
  • net.acuityunlimited.gen.messaging.ContentItemModification

ContentObjectModification implements Iterable<ContentItemModification>

AtomPub serialisation

Content changes using the above POJOs are serialised to AtomPub as follows:

An AtomPub Feed for the ContentObjectModification with:

  • a Category for the server version
  • a Category for the message format
  • an Author for the user
  • a Title for the textual description
  • an Updated item for the date/time of modification
  • a Link with a relationship REL_VIA for the object URI
  • a Link with a relationship REL_RELATED for the parent object URI
  • Category elements for the object type(s)

An AtomPub Entry for each ContentItemModification:

  • an Author for the user (same as Feed)
  • a Title (same as Feed)
  • an Updated entry (same as Feed)
  • a Content item with the URL of the content location and the MIME type (for Add and Update)
  • a Link with a relationship REL_VIA for the item's URI
  • a Summary item with a textual description (mandatory, but not used for anything)
  • Category items for the before and after states (for state changes)
  • a Category item for the type of modification (not present for state-only changes)

Example messages

Code Block
langxml
titleIngest
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:changes="http://www.acuityunlimited.net/fedora/messaging#">
  <category term="3.5-SNAPSHOT" scheme="info:fedora/fedora-system:def/view#version"></category>
  <category term="http://www.acuityunlimited.net/fedora/messaging#content-change-message-0.1" scheme="http://www.fedora.info/definitions/1/0/types/formatURI"></category>
  <author>
    <name>fedoraAdmin</name>
  </author>
  <title type="text">ingest</title>
  <updated>2011-08-03T08:14:25.943Z</updated>
  <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass"></link>
  <id>urn:uuid:c4bf3a5c-69e7-4a1a-941b-dfd328bb4a48</id>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:296f93e1-7f22-430a-99f6-256cf13d1fa3</id>
    <title type="text">ingest</title>
    <updated>2011-08-03T08:14:25.943Z</updated>
    <content type="text/xml" src="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/DC/content?asOfDateTime=2008-07-02T05:09:42.921Z"></content>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/DC/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:CONTENT:ITEM:STATEAFTER" term="ACTIVE">Fedora datastream state after modification</category>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="ADD">Fedora datastream modification type</category>
  </entry>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:aabed436-70b7-4e21-aec7-bde3c70d8ba7</id>
    <title type="text">ingest</title>
    <updated>2011-08-03T08:14:25.943Z</updated>
    <content type="application/rdf+xml" src="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/RELS-EXT/content?asOfDateTime=2008-07-02T05:09:42.921Z"></content>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/RELS-EXT/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:CONTENT:ITEM:STATEAFTER" term="ACTIVE">Fedora datastream state after modification</category>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="ADD">Fedora datastream modification type</category>
  </entry>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:3931f264-2a58-4f20-b177-39fb2a4e49ce</id>
    <title type="text">ingest</title>
    <updated>2011-08-03T08:14:25.943Z</updated>
    <content type="image/jpeg" src="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/FULL_SIZE/content?asOfDateTime=2008-07-02T05:09:42.921Z"></content>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/FULL_SIZE/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:CONTENT:ITEM:STATEAFTER" term="ACTIVE">Fedora datastream state after modification</category>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="ADD">Fedora datastream modification type</category>
  </entry>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:70f98231-301f-4f5e-a1d3-14d57f2a7322</id>
    <title type="text">ingest</title>
    <updated>2011-08-03T08:14:25.943Z</updated>
    <content type="image/jpeg" src="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/MEDIUM_SIZE/content?asOfDateTime=2008-07-02T05:09:42.921Z"></content>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/MEDIUM_SIZE/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:CONTENT:ITEM:STATEAFTER" term="ACTIVE">Fedora datastream state after modification</category>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="ADD">Fedora datastream modification type</category>
  </entry>
</feed>
Code Block
langxml
titleDatastream content modification
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:changes="http://www.acuityunlimited.net/fedora/messaging#">
  <category term="3.5-SNAPSHOT" scheme="info:fedora/fedora-system:def/view#version"></category>
  <category term="http://www.acuityunlimited.net/fedora/messaging#content-change-message-0.1" scheme="http://www.fedora.info/definitions/1/0/types/formatURI"></category>
  <author>
    <name>fedoraAdmin</name>
  </author>
  <title type="text">modifyDatastreamByValue</title>
  <updated>2011-08-03T08:18:48.092Z</updated>
  <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass"></link>
  <id>urn:uuid:babb08b1-46dd-407c-859d-f2f355634e3a</id>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:0bcf71cb-620b-453a-9ffa-443f414595b7</id>
    <title type="text">modifyDatastreamByValue</title>
    <updated>2011-08-03T08:18:48.092Z</updated>
    <content type="text/xml" src="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/DC/content?asOfDateTime=2011-08-03T08:18:48.092Z"></content>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/DC/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="UPDATE">Fedora datastream modification type</category>
  </entry>
</feed>
Code Block
langxml
titleDatastream state change
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:changes="http://www.acuityunlimited.net/fedora/messaging#">
  <category term="3.5-SNAPSHOT" scheme="info:fedora/fedora-system:def/view#version"></category>
  <category term="http://www.acuityunlimited.net/fedora/messaging#content-change-message-0.1" scheme="http://www.fedora.info/definitions/1/0/types/formatURI"></category>
  <author>
    <name>fedoraAdmin</name>
  </author>
  <title type="text">setDatastreamState</title>
  <updated>2011-08-03T08:20:30.460Z</updated>
  <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass"></link>
  <id>urn:uuid:be983aa5-43fd-4330-b8bb-127bac00c3cb</id>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:841b1023-52c7-4322-b10a-daa2037bb3e7</id>
    <title type="text">setDatastreamState</title>
    <updated>2011-08-03T08:20:30.460Z</updated>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/DC/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:CONTENT:ITEM:STATEBEFORE" term="ACTIVE">Fedora datastream state before modification</category>
    <category scheme="changes:CONTENT:ITEM:STATEAFTER" term="INACTIVE">Fedora datastream state after modification</category>
  </entry>
</feed>
Code Block
langxml
titleObject Purge
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:changes="http://www.acuityunlimited.net/fedora/messaging#">
  <category term="3.5-SNAPSHOT" scheme="info:fedora/fedora-system:def/view#version"></category>
  <category term="http://www.acuityunlimited.net/fedora/messaging#content-change-message-0.1" scheme="http://www.fedora.info/definitions/1/0/types/formatURI"></category>
  <author>
    <name>fedoraAdmin</name>
  </author>
  <title type="text">purgeObject</title>
  <updated>2011-08-03T08:21:32.169Z</updated>
  <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass"></link>
  <id>urn:uuid:9733fc70-67e0-49ef-aa2b-e8d9c95a5db7</id>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:2d769c36-baba-47be-8716-524a58542063</id>
    <title type="text">purgeObject</title>
    <updated>2011-08-03T08:21:32.169Z</updated>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/DC/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="DELETE">Fedora datastream modification type</category>
  </entry>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:4164cd78-1fa3-4292-bf7c-47f62bb672b7</id>
    <title type="text">purgeObject</title>
    <updated>2011-08-03T08:21:32.169Z</updated>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/RELS-EXT/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="DELETE">Fedora datastream modification type</category>
  </entry>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:796c0612-171f-46ad-bcb7-e2a9a6166ae7</id>
    <title type="text">purgeObject</title>
    <updated>2011-08-03T08:21:32.169Z</updated>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/MEDIUM_SIZE/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="DELETE">Fedora datastream modification type</category>
  </entry>
  <entry>
    <author>
      <name>fedoraAdmin</name>
    </author>
    <id>urn:uuid:64571aac-58cc-47d2-a9b6-3bf0ac870101</id>
    <title type="text">purgeObject</title>
    <updated>2011-08-03T08:21:32.169Z</updated>
    <link rel="via" href="http://localhost:8080/fedora/objects/demo:SmileyBeerGlass/datastreams/FULL_SIZE/content"></link>
    <summary type="text">Datastream modification/state change, see content src for new content</summary>
    <category scheme="changes:ITEM:CHANGE:TYPE" term="DELETE">Fedora datastream modification type</category>
  </entry>
</feed>

Issues

The initial implementation is based on the basic principle of sending messages that are not tightly-coupled to Fedora's API, but are a more generic representation of notifying consumers of changes of content within Fedora. The initial implementation is scoped only to provide notifications of datastream state and content changes.

Packaging

  • Build separate jars for server and common classes - to enable message consumers to use the deserialisation classes without including the server integration components

Content models, object types

  • A placeholder has been included to transmit the "type" of object, as the consumer may wish to make decisions based on this (eg indexing only certain categories of objects)
    • How to identify the "type" - eg URIs corresponding to the subscribed content models?
    • Or should this be something configurable, eg ability to send dc:type?
    • Should this be something configurable via the CMA?
    • Should we send the type in the message, or provide a callback for the typing information?

Message selecting/filtering

  • No JMS messages properties are currently set, so no selecting/filtering is possible
    • Potentially add a new method to the Fedora messaging module to provide a set of JMS message properties; or extend the FedoraMessage base class to include these and provide an implementation for the Content change message (and modify the existing API-M message in line with this)
  • No message property appears to be set for the overall type/format of the message (FedoraMessage includes getFormat(), but this is currently unused
    • Set a JMS message property to identify the overall type of message (so appropriate deserialiser can be used eg for API-M messages vs content change messages)

Data model

  • Does not currently include object and datastream properties
    • Could we have a generic representation of these that isn't coupled to Fedora's representation?
    • Do we model this as a content change (so with its own content item identifier - which doesn't currently correspond to anything in the Fedora REST API) or have a separate notion of properties?
    • Do we provide a callback URL for a property change/set of property changes, or send the properties in the message?
      • The REST API doesn't have generic "properties" endpoints - but these could be added
      • There are endpoints eg for datastream profile and object profile, but these are Fedora-specific; should we treat these as content items in their own right (though I believe these don't cover every object/datastream property)?
  • State changes
    • Currently only datastream state changes are sent.
    • The "meaning" of object and datastream states is implementation-dependent, Fedora doesn't define the meaning of these
    • Do we combine object and datastream states to give an overall state for the content item? (eg, object=deleted and datastream=active -> content item deleted; object=active and datastream=deleted -> content item deleted)
    • Or should this be configurable in the server plug-in?
    • Or treat the states as generic object/datastream properties, and let the consumer decide? (and remove state changes from the model)

Content Model Architecture integration

  • What's the potential for this; eg using the CMA to specify what messages get sent?
  • Compound and atomistic models
    • Currently a placeholder is included for the "parent" object, notionally this would be the "main" object in an atomistic model, so for instance indexing could identify the "main" object when indexing datastreams in "child" objects
    • This implies some knowledge of content models by the consumer. Should instead the CMA be used to "hide" this and provide an abstraction of objects for consumer that isn't coupled to the base Fedora object model?

Relationships and the resource index

  • Fedora constructs additional relationships (outside of RELS-EXT etc), currently defined in code, eg relationships betweeen objects and datastreams
  • How to notify a consumer of these? Do we need to?
    • eg a consumer that wants to maintain a "semantic index" of both Fedora's current RI and additional RDF datastreams

Initial population of indexes etc

  • For a repository that is already populated, how do we transmit the already-existing content tothe consumer (and how do we rebuild indices from scratch); ie some kind of "rebuilder"

Identifiers for objects, items, and location of content

  • HTTP REST API endpoints are used as object and datastream identifiers - should we instead use the info:fedora/ URI scheme? Or ...
  • Content location identifiers are REST API endpoints for datastream versions - should this be configurable, eg if Fedora is not directly exposed but there is a local application API that provides alternative endpoints for serving content?

Integration patterns - Fedora receiving messages

  • Potentially a message consumer could be implemented to receive content change messages; these could be translated into Fedora API methods. This could be useful in workflow, eg a service that listens for image datastream changes and creates thumbnails based on these.
  • Could a synchronous pattern be implemented (would it be useful), eg a Fedora API operation doesn't finish until the message consumer has also finished.

Miscellaneous

  • Fedora's fcfg specifies in the datastore configuration, a "messageTypes" parameter. This currently must be either or both of apimUpdate and apimAccess, and cannot be blank or omitted. The implication of this is that a datastore configured for content change messages will also receive these messages. See FCREPO-958.