...
Since preservation often includes copying and/or transferring custody of
Items to another
DSpace repository or even another type of archive, the history data
has to be meaningful outside of the context of DSpace.
An RDF framework for History Records
The following sections describe a "framework" (since is not formal
enough to be a schema) for the content of history records.
Namespaces
Panel |
---|
borderColor | #ccc |
---|
bgColor | #fff |
---|
borderStyle | dashed |
---|
title | Namespaces in the History schema | borderStyle | dashed |
---|
|
prefix | Namespace URI | Description |
---|
| Code Block |
---|
http<b></b>://www.w3.org/1999/02/22-rdf-syntax-ns# |
| RDF | | Code Block |
---|
http<b></b>://www.w3.org/2000/01/rdf-schema# |
| RDF Schema | | Code Block |
---|
http<b></b>://metadata.net/harmony# |
| ABC Harmony | | Code Block |
---|
http<b></b>://purl.org/dc/elements/1.1/ |
| Dublin Core (unqualified) | | Code Block |
---|
http<b></b>://www.dspace.org/history# |
| DSpace History | | Code Block |
---|
http<b></b>://www.dspace.org/objectModel# |
| DSpace Object Model |
|
URIs of DSpace Objects
To write RDF statements about DSpace Objects, we need URIs for them.
These URIs have to meet the following requirements:
...
URL.
This is the most unique identifying feature of an EPerson, and although it
is not archival it is at least globally unique.
DSpace History RDF "schema"
These are the classes and properties used to describe History events.
Action classes
The following are subclasses of
...
- - superclass for the other actions.
- - add a new member to the subject object.
- - remove a member from the subject object.
- - create a new subject.
- - destroy the subject.
- - modify content of the subject.
Code Block |
---|
history:ModifyMetadata |
- modify metadata describing the subject.
DSpace Object classes
An object in the data model is typed by one of the following
classes, with names matching the corresponding DSpace constants.
...
Properties of an Action
These properties have the domain
...
- - range is a
Code Block |
---|
dsh:DSpaceObject |
, the "subject" of the event. - - range is a
Code Block |
---|
dsh:DSpaceObject |
, the "subject" of the event. - - range is a
Code Block |
---|
dsh:DSpaceObject |
, the "subject" of the event. - - range is a literal ISO 8601 timestamp
Code Block |
---|
history:inArchive |
- range is a - - range is a
Code Block |
---|
dsh:DSpaceObject |
, the "object" of the event. Code Block |
---|
abc:hasParticipant |
range is Code Block |
---|
history:usesTool |
range is literal, ExtraLogInfo from the event.- range is literal,
Code Block |
---|
"event.getDetail()" |
(if available). Code Block |
---|
history:transactionID |
range is literal, Code Block |
---|
"event.getTransactionID()" |
(if available).
Properties of a DSpace Object
The following properties have the domain
...
- - range is a literal, the object's title or proper name.
- - range is a literal, the object's type or purpose. This only gets used on Bitstreams.
History Implementation
The prototype does essentially three things:
- Record history of all relevant data model changes.
- Fetch history statements covering the history of a given object.
- Fetch history records in answer to a free-form query.
Transformation of Event into History Statements
When the History event consumer sees an event, it might apply a
transformation before translating it to RDF:
...
- of
Code Block |
---|
abc:Manifestation |
(or for an EPerson). - of or whatever the type of the object is.
- with the name or title of the object, if available.
- with the name of the owning Bundle when the object is a a Bitstream. This can be helpful to preservationists since it indicates the purpose of the bitstream.
Lacunae: Events that Cannot Be Recorded
Due to the inherent conflict between the low-level style
architecture of the Event System,
and the requirement that History records identify all data model
objects by their persistent identifiers, some events simply
cannot be translated from the data in the event stream into History
records. The event stream identifies data model objects by "ephemeral" database keys (for speed, and since not all objects have persistent
identifiers like Handles), so the event consumer has to look up extras like the persistent identifier, and
any attributes of the Subject and Object of the event.
However, if any of those objects gets deleted in the transaction that generates that event, it is too late to look up
the persistent identifier (which is why it is packaged in the "detail"
field of some events). Here are the specific situations in which
an event cannot be recorded in the History:
...
Although this leaves holes (lacunae) in the history record of an archive,
there is still enough information recorded to tell a preservationist
the fate of any objects missing from the archive. The Delete
events are recorded accurately for all archival objects. Since
all Bitstreams of archival significance are owned by Items (and
the Add events showing that are traceable in the History record),
their fate can be inferred from a Delete record for their Item. Some
Remove events are lost, but they can also be inferred from a
Delete event. The record is a bit messy and incomplete, but it
is still quite usable.
Retrieving History of an Object
NOTE: The new
...
Code Block |
---|
abc:hasParticipant |
.
Examples
- Item History report in N3
- Item with life cycle ending in delete, in N3
- Dead Link: Same Item with life cycle ending in delete, in RDF/XML
Installing and Operating
To install the prototype implementation, download the source
and follow instructions to install it:
Download and Install
Preparation
- Start with DSpace 1.5 source checkout (ca. January 5, 2006)
- Apply the EventSystemPrototype patch as directed on that page.
- Apply the AipPrototype as directed on the page.
Downloads
- Changes to dspace.cfg file
- JAR files to add
- Java Source files to add
Installation
Working in your DSpace installation directory:
- Shut down your servlet container.
- Apply the source change to the DSpace configuration with patch:
Code Block |
---|
patch -l config/dspace.cfg < history-dspace.cfg.diff |
(or, manually apply the changes to your configuration file.) - Make sure the changes are propagated to the configuration file in your run-time directory.
- Unpack the
Code Block |
---|
history-new-libs.zip |
file with . - Unpack the
Code Block |
---|
history-new-source.zip |
file with . - Rebuild all sources with
Code Block |
---|
ant clean install_code build_wars |
Configuration
The History system requires the following configuration keys:
- Ignore History metadata in non-AIP METS packages:
Code Block |
---|
mets.default.ingest.crosswalk.DSpaceHistory = NULLSTREAM |
- Streaming dissemination crosswalk, to be added to the plugins configured for
Code Block |
---|
StreamDisseminationCrosswalk |
: Code Block |
---|
org.dspace.history.HistoryStreamDisseminationCrosswalk = HISTORY |
- Streaming ingestion crosswalk, to be added to the plugins configured for
Code Block |
---|
StreamIngestionCrosswalk |
: Code Block |
---|
org.dspace.history.HistoryStreamIngestionCrosswalk = HISTORY |
- Add an event consumer named "history":
Code Block |
---|
event.consumer.browse.class = org.dspace.browse.BrowseConsumer<br>event.consumer.browse.filters = Item+Create|Modify|Modify_Metadata:Collection+Add|Remove |
- Add the history consumer to the default dispatcher:
Code Block |
---|
event.dispatcher.default.consumers = history:sync ... |
- To disseminate history records in AIPs, add:
Code Block |
---|
aip.disseminate.digiprovMD = DSpaceHistory:HISTORY |
- To ingest history from AIPs, add:
Code Block |
---|
mets.dspaceAIP.ingest.crosswalk.DSpaceHistory = HISTORY |
Operation
Before starting a DSpace application or the servlet container for
the first time, you may wish to move or clean out the contents
of the
...
You can export the RDF in RDF/XML format by specifying "xml" after the "-f" switch instead of "n3". Use the "-h" switch to see other options.
Future Work
Since this is a prototype, there are some things left undone:
- Backup strategy. _NOTE: This has been solved, see -D and -R options of HistoryRepository command-line application._The History RDF data is stored in a "native" triple-store, which is an OpenRDF application-defined format. If it were ever corrupted, some or all of the history data would be lost. But don't worry about that just because this is based on an "alpha" release of OpenRDF 2.0...
- It is not really good enough to save just the RDF triples (as N3 or RDF/XML); OpenRDF actually records them as "quads", adding an extra resource called the "context". DSpace History uses that context to bind each triple to the URI of a DSpace Object, which makes it very efficient to retrieve all the History records about a particular object.
- You could just export the RDF in the triplestore with the "-x" option; it should be possible to sort out the mapping of records to objects again without the "context", it would be a lot of extra work and there is no code to do it yet. It's much better to simply save the state of the quads.
- See the -Q option of the
Code Block |
---|
org.dspace.history.HistoryRepository |
; with a little tuning (notably dealing with data types and literals) this export could be used to restore the triplestore, although you'd have to write an ingester too.
- Experiment with making the History consumer asynchronous.
- Export History data to a SIMILE timeline
See Also