You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

DSpace 2.0 Storage Service Implementations Based on Semantic Content Repository - Yigang Zhou


Develop DSpace storage service implementations based on semantic content repositories (TripleStore). - Yigang Zhou


Abstract


On the on hand, DSpace 2.0 has a generalized storage service API which allows a DSpace 2.0 repository to use many possible systems to store digital repository data. On the other hand, semantic content repositories (triplestores) such as Mulgara, Sesame and Tupelo are available for semantic data storage, which are suitable for storing blobs and metadata from DSpace represented in the form of RDF triples. In this project, I will develop DSpace storage service implementations based on semantic content repositories. Finally, I will cooperate with Andrius Blažinskas who is working on another GSoC 2010 project of back-porting DSpace 2.0 storage interfaces to 1.x, to make triplestore storage service ready to use for DSpace 1.x.


Project Title:

DSpace 2.0 Storage Service Implementations Based on Semantic Content Repository

Student:

Yigang Zhou, Wuhan University, P.R. China

Mentors:

Mark Diggory

Contacting author:

egang DOT zhou AT gmail DOT com

SCM Location for Project:

http://scm.dspace.org/svn/repo/sandbox/gsoc/2010/triplestore/


Architecture


The design principles of the architecture should be:

  1. The triplestore StorageService can be compatible to all kinds of semantic data storages (e.g. Sesame, Jena, etc.), through different configuration settings.
  2. Other new semantic data storages (e.g. Mulgura) can be easily plugin into the architecture without much efforts and need not modify code of the API.
  3. The triplestore StorageService/BinaryStorageService should be able to accommodate both StorageEntity/StorageProperty for entity metadata information in the form of RDF triples and StorageBinary for blobs (binary/textual data).

This architecture is quite similar with JackrabbitStorageService, which sits in front of all kinds of PersistenceManagers for different databases.

As is shown in Figure 1, we have a TupeloStorageService holding a reference to a Context (i.e. a triplestore instance in Tupelo) object. The Context is actually a UnionContext, which combines a sub Context A (e.g. SesameContext or MulguraContext) for RDF triples and a blob-related sub Context B (e.g. FileContext). All the functions related to StorageService will be dispatched to Context A, while those of BinaryStorageService can be delivered to Context B. It's quite flexible for the choices of Context A and B. No restrictions on the combination groups. For example, we can use HashFileContext or DatabaseContext for Context A, with SesameContext, Sesame2Context or PersisenceJenaContext as Context B. We can also use Spring configuration for Context A and B injections into the UnionContext. Currently, there's no Mulgara implementation of Tupelo Context. But I can develop a new MulguraContext in this GSoC project. The new Context will not affect the source code of TupeloStorageService at all. It can be easily plugin into the architecture through Spring configuration.


Unknown macro: {div}

Figure 1. UML Diagram


Discussions


TripleStore Penultimate API


Many triplestore implementations are "battling" to be the penultimate API that one would implement against with configuration of the others as underlying storage. The state-of-art of mainstream Java based triplestores are summarised as follows:

  1. Tupelo defines its own RDF API and makes Jena, Sesame as its underlying storage in the form of Context.
  2. AllegroGraph defines its own RDF API and provides Jena, Sesame wrapper classes for users to access AllegroGraph using Jena, Sesame API.
  3. Mulgara use JRDF's RDF API and provides a bridge to Jena API.
  4. Jena, Sesame defines their own stand alone RDF APIs.
  • No labels