You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 17 Next »

DSpace 2.0 Storage Service Implementations Based on Semantic Content Repository - Yigang Zhou


Develop DSpace storage service implementations based on semantic content repositories (TripleStore). - Yigang Zhou


Abstract


On the on hand, DSpace 2.0 has a generalized storage service API which allows a DSpace 2.0 repository to use many possible systems to store digital repository data. On the other hand, semantic content repositories (triplestores) such as Mulgural, Sesame and Tupelo are available for semantic data storage, which are suitable for storing blobs and metadata from DSpace represented in the form of RDF triples. In this project, I will develop DSpace storage service implementations based on semantic content repositories. Finally, I will cooperate with Andrius Blažinskas who is working on another GSoC 2010 project of back-porting DSpace 2.0 storage interfaces to 1.x, to make triplestore storage service ready to use for DSpace 1.x.


Project Title:

DSpace 2.0 Storage Service Implementations Based on Semantic Content Repository

Student:

Yigang Zhou, Wuhan University, P.R. China

Mentors:

Mark Diggory

Contacting author:

egang DOT zhou AT gmail DOT com

SCM Location for Project:

http://scm.dspace.org/svn/repo/sandbox/gsoc/2010/triplestore/


Architecture


The design principles of the architecture should be:

  1. The triplestore StorageService can be compatible to all kinds of semantic data storages (e.g. Sesame, Jena, etc.), through different configuration settings.
  2. Other new semantic data storages (e.g. Mulgural) can be easily plugin into the architecture without much efforts and need not modify code of the API.
  3. The triplestore StorageService/BinaryStorageService should be able to accommodate both StorageEntity/StorageProperty for entity metadata information in the form of RDF triples and StorageBinary for blobs (binary/textual data).

This architecture is quite similar with JackrabbitStorageService, which sits in front of all kinds of PersistenceManagers for different databases. As is shown in Figure 1, we have a TupeloStorageService holding a reference to a Context (i.e. a triplestore instance in Tupelo) object. The Context is actually a UnionContext, which combines a sub Context A (e.g. SesameContext or MulguralContext) for RDF triples and a blob-related sub Context B (e.g. FileContext). All the functions related to StorageService will be dispatched to Context A, while those of BinaryStorageService can be delivered to Context B. It's quite flexible for the choices of Context A and B. No restrictions on the combination groups. For example, we can use HashFileContext or DatabaseContext for Context A, with SesameContext, Sesame2Context or PersisenceJenaContext as Context B. We can also use Spring configuration for Context A and B injections into the UnionContext. Currently, there's no Mulgural implementation of Tupelo Context. But I can develop a new MulguralContext in this GSoC project. The new Context will not affect the source code of TupeloStorageService at all. It can be easily plugin into the architecture through Spring configuration.


Unknown macro: {div}

Figure 1. UML Diagram

  • No labels