Title: Moving DSpace into the age of the Semantic Web

Student: Peter Coetzee

Mentor: Mark Robert Diggory

Co-Mentors: James Rutherford, Scott Phillips, ???

Abstract

The current architecture of DSpace uses the DSpace Intermediate Metadata (DIM) format to store information about the items in its archive. It is presently crafted with a view to being concealed from outer view. This proposal is targeted at translating this private metadata format into an accepted metadata publishing standard, and integrating it into the web of data in accordance with Linked Data best practices (http://linkeddata.org/). By creating de-referencable URIs (as RDF/XML, N3, HTML etc.) for everything in the metadata store, it becomes trivial for a user to browse the data store based around related concepts ("This Author", "This Subject" and the like) simply by clicking on the metadata item in the Manakin display.

Furthermore, a semantic web client may use these de-referencable URIs as the starting point for information discovery, as well as formulating arbitrarily complex queries over its SPARQL endpoint. This work could directly feed into stated future plans DSpace has outlined for a federated query system over disparate data sources (LDAP, JDBC, DNS, XCat/OCLC/Barton, Google Scholar etc.)

This work would firmly begin to move DSpace into the age of the Semantic Web, providing compelling utility for users, and helping to boot-strap the web of data through the wealth of existing DSpace instances.

Broad Goals

There are four primary and heavily inter-related components to this proposal:

Goal Breakdown

DIM Conversion

Embedded RDF

Metadata Dereferencing

SPARQL Endpoint

Open Issues

RDF Serialisation

Integration With Existing DSpace Code Base

Reworked Metadata System

A new Metadata system is to be used (see org.dspace.metadata). This will represent an RDF graph as triples (or 'MetadataItems') of {URIResource, Predicate, Value}, where "Value" can be a URIResource or a LiteralValue. DSpaceObject is transformed into an Interface, with the existing abstract implementation factored into DSpaceObjectCore. The DSpaceObject interface extends URIResource. The graph will all be handled by the MetadataManager service, thus helping to keep the DSpace model as anemic as possible. This will permit a user to pass a DSpaceObject to the Manager, and retrieve a set of MetadataItems pertaining to the object. More specific searches will be possible by implementing the Selector interface. The onus of authorisation is on the MetadataManager, to ensure the current user has permission to act on the metadata in the requested fashion.

Content Integration

A StackableDAO is used to connect D2RQ to the triple store, such that each time a modification for the given DSpaceObject occurs, it is replicated in the triple store. The mapping is thus still controlled by the single D2RQ mapping file, without the performance penalties associated with exposing the D2RQ model publicly.

Data Publishing

This is handled in a set of Servlets, which will respond to SPARQL queries or conduct Linked Data content negotiation. They can simply be plugged into whichever UI is desired (or a separate webapp, if the instance is only to serve RDF).

DESCRIBE queries are handled using a custom DESCRIBE handler. This again uses the MetadataManager to get its data and handle authorisation for the resource requested. The query body is not changed in this case.

Embedded RDF will have to be implemented separately for each UI that it is desired for. An implementation exists for the JSPUI, which takes the form of a new tag, dspace:metadata. Example usages:

<%
    Community community = (Community) request.getAttribute("community");
%>
<!-- Display metadata about a single DSpaceObject -->
<dspace:metadata resource="<%= community %>" />


<!-- Display metadata about a set of DSpaceObjects -->
<dspace:metadata resource="<%= community.getCollections() %>" />

This tag will print out a block of RDFa metadata into the page, by default hidden from view. If you wish for this block of metadata to be displayed, add an attribute '

display="true"

' to the tag.

Remaining Tasks

Orthogonal Issues

Cleaning up Qualified Dublin Core with DCTERMS/RDF

References and Other Research

References

http://www.tomjewett.com/dbdesign/dbdesign.php?page=intro.html Tom Jewett provides an excellent beginners overview of relational modeling in this tutorial, I've found that the patterns located here manifest themselves throughout the the DSpace database. --Mark Diggory 18:32, 26 May 2008 (EDT)

http://hcs.science.uva.nl/usr/Schreiber/docs/owl-uml/owl-uml.html Some alignment details between UML and OWL Lite.

Other Research

http://simile.mit.edu/reports/stores/ and http://esw.w3.org/topic/RdfStoreBenchmarking seem well executed, if potentially out of date - is there any value in repeating this work with DSpace specific instance data? – Peter Coetzee

D2RQ

http://www.w3.org/2007/03/RdfRDB/papers/d2rq-positionpaper/

OpenLink Virtuoso

http://virtuoso.openlinksw.com/wiki/main/Main/VOSSQLRDF