Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Title: Objects can be associated with a PREMIS event service
  • Primary Actor:
  • Scope:
  • Level:
  • Story: An object can be associated with a listener for PREMIS events. This listener should receive event message rather than whole documents, but be able to export a log of the event history. Attempting to emulate this in Fedora 3.x requires constant replacement of an XML document, and has collision problems when integrated into a system of parallel, distributed repository processes.

Implementation Proposal:

The Fedora 4 repository relies heavily on metadata to describe and manage its digital content.  There is a large intersection between the types of data Fedora uses in its implementation to manage data and the PREMIS standard for metadata that supports information necessary in the description of preserved digital objects.  In fact Fedora currently relies on PREMIS to describe content nodes such as the properties premis:hasContentLocaiton and premis:hasSize.

...

"This OWL ontology allows one to provide a Linked Data-friendly, PREMIS-endorsed serialization of the PREMIS Data Dictionary version 2.2. This can be leveraged to have a Linked Data-friendly data management function for a preservation repository, allowing for SPARQL querying. It integrates PREMIS information with other Linked Data compliant datasets, especially format registries, which are now referenced from the PREMIS ontology (for instance, the Unified Digital Format Registry [4] and PRONOM [5]). Thus information can be more easily interconnected, especially between different repository databases. The OWL design of PREMIS should NOT be considered as a replacement for the XML Schema: the two of them should rather be considered complementary. Work to align the PREMIS ontology with the PROV ontology [6] is being considered."

The first case of adding previous or external events to an object could be implemented via a REST service, either leveraging the current REST services for adding properties, or creating a new REST endpoint specific for events.  In the second case, where processing is triggered by repository events, the google eventing machinery subscribing to repository events could be utilized, either synchronously perhaps using the auditing module, or asynchronously by subscribing to a message queue through an extension of the fcrepo-jms-indexer-pluggable module.  In some cases the eventing model may not capture with specific enough detail all the repository actions that one would wish to preserve in which case is may be necessary to couple a method explicitly with the fedora action or define and generate a new set of events for these.As far as retrieval, in any of these cases, the event descriptions could be retrieved directly via a creating a REST endpoint that retrieves the events of a node, or by using SPARQL queries against an external triplestore set up in conjunction with the core repository, or by using SQL against a relational databaes populated by a "SQL indexer group" created as an extension within the fcrepo-jms-indexer-pluggable.  Events could be described by any subject/predicate/object combination, but the PREMIS ontology already provides much of the linked data necessary.  Here is an example use of the PREMIS vocabulary:1) A fedora resource with an RDF type indicating that it is an object that maintains event information, similar to the current implementation where an RDF type is used to indicated a resource is indexable or describable by dublin core.  Unfortunately there is no unique PREMIS URI lends itself totally to this description.  One that may work is the #hasEvent:

<object> <rdf:type> <http://id.loc.gov/ontologies/premis.html#hasEvent> .

...

Complications:

1) The JCR events (NODE_ADDED,PROPERTY_CHANGED, etc) don't necessarily map 1:1 to all the PREMIS or other event types that one may wish to track.  So the Fedora implementation of some PREMIS events may not be able to rely on the existing event machinery.  In these exceptional cases, either the PREMIS may have to be coupled in the code to where the event actually takes place or a new set of Fedora events will have to be generated for the events that aren't explicitly mapped to JCR events or have the necessary information.

2) The PREMIS event types cover some but not all of the events one may wish to track.  For example, a copied file maybe expressed by http://id.loc.gov/vocabulary/preservation/eventType/rep.html, a checksum maybe expessed by http://id.loc.gov/vocabulary/preservation/eventType/mes.html, but there is no eventType for a versioning.  Again, the PREMIS vocabulary may have to be extended to accomodate all of the desired events.3) If the number of PREMIS events becomes unwieldly, it might make sense to have a PREMIS container object added as a layer in between the object and its events for better incapsulation.Resources:
http://www.loc.gov/standards/premis/ontology-announcement.html
https://wiki.duraspace.org/display/FF/Fedora+Repository+Home
https://wiki.duraspace.org/display/FF/Properties+CRUD
https://wiki.duraspace.org/display/FF/RESTful+HTTP+API
https://github.com/futures/fcrepo4/blob/master/fcrepo-audit/src/main/java/org/fcrepo/audit/LogbackAuditor.java
https://github.com/futures/fcrepo4/blob/master/fcrepo-jms/src/main/java/org/fcrepo/jms/observer/JMSTopicPublisher.java
http://id.loc.gov/vocabulary/preservation/eventType.html