Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Panel

Contents

Table of Contents
outlinetrue
stylenone

History System Protoype for DSpace 1.5

...

Since preservation often includes copying and/or transferring custody of
Items to another
DSpace repository or even another type of archive, the history data
has to be meaningful outside of the context of DSpace.

An RDF framework for History Records

The following sections describe a "framework" (since is not formal
enough to be a schema) for the content of history records.

Namespaces

Panel
borderColor#ccc
bgColor#fff
borderStyledashed
titleNamespaces in the History schemaborderStyledashed

prefix

Namespace URI

Description

Code Block
rdf:
Code Block
http<b></b>://www.w3.org/1999/02/22-rdf-syntax-ns#

RDF

Code Block
rdfs:
Code Block
http<b></b>://www.w3.org/2000/01/rdf-schema#

RDF Schema

Code Block
abc:
Code Block
http<b></b>://metadata.net/harmony#

ABC Harmony

Code Block
dc:
Code Block
http<b></b>://purl.org/dc/elements/1.1/

Dublin Core (unqualified)

Code Block
history:
Code Block
http<b></b>://www.dspace.org/history#

DSpace History

Code Block
dso:
Code Block
http<b></b>://www.dspace.org/objectModel#

DSpace Object Model

URIs of DSpace Objects

To write RDF statements about DSpace Objects, we need URIs for them.
These URIs have to meet the following requirements:

...

URL.
This is the most unique identifying feature of an EPerson, and although it
is not archival it is at least globally unique.

DSpace History RDF "schema"

These are the classes and properties used to describe History events.

Action classes

The following are subclasses of

...

  • Code Block
    history:Action
    - superclass for the other actions.
  • Code Block
    history:Add
    - add a new member to the subject object.
  • Code Block
    history:Remove
    - remove a member from the subject object.
  • Code Block
    history:Create
    - create a new subject.
  • Code Block
    history:Delete
    - destroy the subject.
  • Code Block
    history:Modify
    - modify content of the subject.
  • Code Block
    history:ModifyMetadata
    - modify metadata describing the subject.

DSpace Object classes

An object in the data model is typed by one of the following
classes, with names matching the corresponding DSpace constants.

...

Code Block
abc:Agent
  • Code Block
    dso:EPerson

Properties of an Action

These properties have the domain

...

  • Code Block
    abc:creates
    - range is a
    Code Block
    dsh:DSpaceObject
    , the "subject" of the event.
  • Code Block
    abc:destroys
    - range is a
    Code Block
    dsh:DSpaceObject
    , the "subject" of the event.
  • Code Block
    abc:hasPatient
    - range is a
    Code Block
    dsh:DSpaceObject
    , the "subject" of the event.
  • Code Block
    abc:atTime
    - range is a literal ISO 8601 timestamp
  • Code Block
    history:inArchive
    - range is a
    Code Block
    dso:Site
  • Code Block
    abc:involves
    - range is a
    Code Block
    dsh:DSpaceObject
    , the "object" of the event.
  • Code Block
    abc:hasParticipant
    range is
    Code Block
    dso:EPerson
  • Code Block
    history:usesTool
    range is literal, ExtraLogInfo from the event.
  • Code Block
    history:detail
    range is literal,
    Code Block
    "event.getDetail()"
    (if available).
  • Code Block
    history:transactionID
    range is literal,
    Code Block
    "event.getTransactionID()"
    (if available).

Properties of a DSpace Object

The following properties have the domain

...

  • Code Block
    dc:title
    - range is a literal, the object's title or proper name.
  • Code Block
    dc:type
    - range is a literal, the object's type or purpose. This only gets used on Bitstreams.

History Implementation

The prototype does essentially three things:

  1. Record history of all relevant data model changes.
  2. Fetch history statements covering the history of a given object.
  3. Fetch history records in answer to a free-form query.

Transformation of Event into History Statements

When the History event consumer sees an event, it might apply a
transformation before translating it to RDF:

...

  • Code Block
    rdf:type
    of
    Code Block
    abc:Manifestation
    (or
    Code Block
    abc:Agent
    for an EPerson).
  • Code Block
    rdf:type
    of
    Code Block
    dso:Item
    or whatever the type of the object is.
  • Code Block
    dc:title
    with the name or title of the object, if available.
  • Code Block
    dc:type
    with the name of the owning Bundle when the object is a a Bitstream. This can be helpful to preservationists since it indicates the purpose of the bitstream.

Lacunae: Events that Cannot Be Recorded

Due to the inherent conflict between the low-level style
architecture of the Event System,
and the requirement that History records identify all data model
objects by their persistent identifiers, some events simply
cannot be translated from the data in the event stream into History
records. The event stream identifies data model objects by "ephemeral" database keys (for speed, and since not all objects have persistent
identifiers like Handles), so the event consumer has to look up extras like the persistent identifier, and
any attributes of the Subject and Object of the event.
However, if any of those objects gets deleted in the transaction that generates that event, it is too late to look up
the persistent identifier (which is why it is packaged in the "detail"
field of some events). Here are the specific situations in which
an event cannot be recorded in the History:

...

Although this leaves holes (lacunae) in the history record of an archive,
there is still enough information recorded to tell a preservationist
the fate of any objects missing from the archive. The Delete
events are recorded accurately for all archival objects. Since
all Bitstreams of archival significance are owned by Items (and
the Add events showing that are traceable in the History record),
their fate can be inferred from a Delete record for their Item. Some
Remove events are lost, but they can also be inferred from a
Delete event. The record is a bit messy and incomplete, but it
is still quite usable.

Retrieving History of an Object

NOTE: The new

Code Block
RDFRepository

...

Code Block
abc:hasParticipant

.

Examples

  1. Item History report in N3
  2. Item with life cycle ending in delete, in N3
  3. Dead Link:  Same Item with life cycle ending in delete, in RDF/XML

Installing and Operating

To install the prototype implementation, download the source
and follow instructions to install it:

Download and Install

Preparation

  1. Start with DSpace 1.5 source checkout (ca. January 5, 2006)
  2. Apply the EventSystemPrototype patch as directed on that page.
  3. Apply the AipPrototype as directed on the page.

Downloads

  1. Changes to dspace.cfg file
  2. JAR files to add
  3. Java Source files to add

Installation

Working in your DSpace installation directory:

  1. Shut down your servlet container.
  2. Apply the source change to the DSpace configuration with patch:
    Code Block
    patch -l config/dspace.cfg  < history-dspace.cfg.diff
    (or, manually apply the changes to your configuration file.)
  3. Make sure the changes are propagated to the configuration file in your run-time directory.
  4. Unpack the
    Code Block
    history-new-libs.zip
    file with
    Code Block
    unzip
    .
  5. Unpack the
    Code Block
    history-new-source.zip
    file with
    Code Block
    unzip
    .
  6. Rebuild all sources with
    Code Block
    ant clean install_code build_wars

Configuration

The History system requires the following configuration keys:

  • Ignore History metadata in non-AIP METS packages:
    Code Block
    mets.default.ingest.crosswalk.DSpaceHistory = NULLSTREAM
  • Streaming dissemination crosswalk, to be added to the plugins configured for
    Code Block
    StreamDisseminationCrosswalk
    :
    Code Block
    org.dspace.history.HistoryStreamDisseminationCrosswalk = HISTORY
  • Streaming ingestion crosswalk, to be added to the plugins configured for
    Code Block
    StreamIngestionCrosswalk
    :
    Code Block
    org.dspace.history.HistoryStreamIngestionCrosswalk = HISTORY
  • Add an event consumer named "history":
    Code Block
    event.consumer.browse.class = org.dspace.browse.BrowseConsumer<br>event.consumer.browse.filters = Item+Create|Modify|Modify_Metadata:Collection+Add|Remove
  • Add the history consumer to the default dispatcher:
    Code Block
    event.dispatcher.default.consumers = history:sync ...
  • To disseminate history records in AIPs, add:
    Code Block
    aip.disseminate.digiprovMD = DSpaceHistory:HISTORY
  • To ingest history from AIPs, add:
    Code Block
    mets.dspaceAIP.ingest.crosswalk.DSpaceHistory = HISTORY

Operation

Before starting a DSpace application or the servlet container for
the first time, you may wish to move or clean out the contents
of the

...

You can export the RDF in RDF/XML format by specifying "xml" after the "-f" switch instead of "n3". Use the "-h" switch to see other options.

Future Work

Since this is a prototype, there are some things left undone:

  1. Backup strategy. _NOTE: This has been solved, see -D and -R options of HistoryRepository command-line application.
    _The History RDF data is stored in a "native" triple-store, which is an OpenRDF application-defined format. If it were ever corrupted, some or all of the history data would be lost. But don't worry about that just because this is based on an "alpha" release of OpenRDF 2.0...
    1. It is not really good enough to save just the RDF triples (as N3 or RDF/XML); OpenRDF actually records them as "quads", adding an extra resource called the "context". DSpace History uses that context to bind each triple to the URI of a DSpace Object, which makes it very efficient to retrieve all the History records about a particular object.
    2. You could just export the RDF in the triplestore with the "-x" option; it should be possible to sort out the mapping of records to objects again without the "context", it would be a lot of extra work and there is no code to do it yet. It's much better to simply save the state of the quads.
    3. See the -Q option of the
      Code Block
      org.dspace.history.HistoryRepository
      ; with a little tuning (notably dealing with data types and literals) this export could be used to restore the triplestore, although you'd have to write an ingester too.
  2. Experiment with making the History consumer asynchronous.
  3. Export History data to a SIMILE timeline

See Also