Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
  • Title: Objects can be associated with a PREMIS event service
  • Primary Actor:
  • Scope:
  • Level:
  • Story: An object can be associated with a listener for PREMIS events. This listener should receive event message rather than whole documents, but be able to export a log of the event history. Attempting to emulate this in Fedora 3.x requires constant replacement of an XML document, and has collision problems when integrated into a system of parallel, distributed repository processes.

Implementation Proposal:

The Fedora 4 repository if nothing else relies heavily on metadata to describe and manage its digital content.  There is a large intersection between the types of data Fedora uses in its implementation to manage data and the PREMIS standard for metadata that supports information necessary in the description of preserved digital objects.  In fact Fedora currently relies on PREMIS to describe content nodes such as the properties premis:hasContentLocaiton and premis:hasSize.

...

"This OWL ontology allows one to provide a Linked Data-friendly, PREMIS-endorsed serialization of the PREMIS Data Dictionary version 2.2. This can be leveraged to have a Linked Data-friendly data management function for a preservation repository, allowing for SPARQL querying. It integrates PREMIS information with other Linked Data compliant datasets, especially format registries, which are now referenced from the PREMIS ontology (for instance, the Unified Digital Format Registry [4] and PRONOM [5]). Thus information can be more easily interconnected, especially between different repository databases. The OWL design of PREMIS should NOT be considered as a replacement for the XML Schema: the two of them should rather be considered complementary. Work to align the PREMIS ontology with the PROV ontology [6] is being considered."

...

<object> <rdf:type> <http://id.loc.gov/ontologies/premis.html#hasEvent> .

However it seems that this URIs primary usage should be as a predicate joining objects to their events:
<object> <#hasEvent> <object/event1> .
<object <#hasEvent> <object/event2> .It might possible for the #hasEvent to be used as both the object of an rdf:type predicate and the predicate pointing to an objects actual event. Otherwise a fedora ontology predicate could be created for the object of the rdf type.2) The PREMIS vocabulary can also be used within the event child objects by creating he child nodes and using the #hasEvent predicate to point to them.  Inside the event nodes PREMIS predicates could be added, at a minimum:
http://id.loc.gov/ontologies/premis.html#hasEventDateTime
http://id.loc.gov/ontologies/premis.html#hasAgent
http://id.loc.gov/ontologies/premis.html#hasEventType
http://id.loc.gov/ontologies/premis.html#hasEventOutcomeInformationThe #hasEventDateTime, and #hasEventType should be self-expanatory. The #hasAgent would be the repository, or an external source of the events, such as a metadata generator or image transformer.  The #hasEventOutcomeInformation could run the gamut from a simple literal, XML, or details expressed in a sub-graph form under on #hasEventOutcomeDetail predicate.3)  A REST endpoint would then be needed for the CRUD of events on an object, something that takes a payload of a triples to create the event properties and subnodes with PUT/POST/PATCH calls, and returns the events and properties for GET calls.   These event properties could also be ingested into a triplestore allowing for selective retrieval via a SPARQL query, or a relational database and retrieved via a SQL query. 

Complications:

1) The JCR events (NODE_ADDED,PROPERTY_CHANGED, etc) don't necessarily map 1:1 to all the PREMIS or other event types that one may wish to track.  So the Fedora implementation of some PREMIS events may not be able to rely on the existing event machinery.  In these exceptional cases, either the PREMIS may have to be coupled in the code to where the event actually takes place or a new set of Fedora events will have to be generated for the events that aren't explicitly mapped to JCR events or have the necessary information.

2) The PREMIS event types cover some but not all of the events one may wish to track.  For example, a copied file maybe expressed by http://id.loc.gov/vocabulary/preservation/eventType/rep.html, a checksum maybe expessed by http://id.loc.gov/vocabulary/preservation/eventType/mes.html, but there is no eventType for a versioning.  Again, the PREMIS vocabulary may have to be extended to accomodate all of the desired events.3) If the number of PREMIS events becomes unwieldly, it might make sense to have a PREMIS container object added as a layer in between the object and its events for better incapsulation.Resources:
http://www.loc.gov/standards/premis/ontology-announcement.html
https://wiki.duraspace.org/display/FF/Fedora+Repository+Home
https://wiki.duraspace.org/display/FF/Properties+CRUD
https://wiki.duraspace.org/display/FF/RESTful+HTTP+API
https://github.com/futures/fcrepo4/blob/master/fcrepo-audit/src/main/java/org/fcrepo/audit/LogbackAuditor.java
https://github.com/futures/fcrepo4/blob/master/fcrepo-jms/src/main/java/org/fcrepo/jms/observer/JMSTopicPublisher.java
http://id.loc.gov/vocabulary/preservation/eventType.html