Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

<?xml version="1.0" encoding="utf-8"?>
<html>
Title: DSpace & Fedora integration

...

Two popular digital content repositories - DSpace and Fedora are quite different in nature and have different data models. Both of the repositories have different advantages. Integration of these two repositories would allow wider digital content dissemination and management possibilities. Utilizing repositories in a separate way, digital content must be prepared and replicated for each of them. To avoid this replication a specific driver implementation, allowing one repository to access data through another repository, must be created. It is obvious that a lot of work must be done to fully achieve desired result, so my proposal is to create a working storage driver prototype for DSpace which will allow storing, accessing and managing at least basic DSpace data in Fedora repository considering its relationships and associated policy.

...

Figure 1 provides relative DSpace data model, which is a little bit extended version of basic model (http://www.dspace.org/index.php?option=com_content&amp;task=view&amp;id=149Image Removed). Several additional fields are added considering database fields.
Possible DSpace data model mapping to Fedoras is provided in Figure 2. In diagram, every Fedora object representing DSpace entity has RELS-EXT datastream. Fragment of its XML contents is provided to show how relationships between objects will be implemented physically.

...

DSpace Item entity contains associated Dublin Core qualified metadata XML file. Fedora does provide default datastream with DC identifier for Dublin Core metadata in every object, so it can be used to contain these fields.

Image RemovedImage Added

Image RemovedImage Added

<!--
Also it should be noted, that Item entity has two types of relation with Collection entity. In Fedora, simple relations between objects are expressed in RELS-EXT using isMemberOf relation type. However, custom relations can easily be introduced, so additional relation isIncludedBy is added here to emphasize inclusion rather than ownership. Not really sure if it is good to use custom relation, but it works.
-->
Relations between mapped DSpace entities in Fedora can be found by searching resource index with ITQL queries. Such a query example:

Code Block
select $object from <#ri>
where  $object <fedora-rels-ext:isMemberOf> 
   <info:fedora/demo:Collection~123.456-789>

...

Code Block
<?xml version="1.0" encoding="UTF-8" ?> 
<sparql xmlns="http://www.w3.org/2001/sw/DataAccess/rf1/result">
<head>
  <variable name="object" /> 
</head>
<results>
  <result>
    <object uri="info:fedora/demo:Item~213.456-789" /> 
  </result>
  <result>
    <object uri="info:fedora/demo:Item~223.456-789" /> 
  </result>
</results>
</sparql>

...

Code Block
select $object from <#ri>
where  $object <fedora-rels-ext:isIncludedBy> 
   <info:fedora/demo:Collection~123.456-789>

...

Code Block
<rdf:RDF xmlns:dspace="http://www.dspace.org/elements/" 
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#
         ... ">
  <rdf:Description rdf:about="info:fedora/demo:Bundle~1">
    <dspace:bitstreamId>9</dspace:bitstreamId>
    ...
  </rdf:Description>
</rdf:RDF>

...

<!--
Fedora PID must satisfy pattern: '(A-Za-z0-9||\.)(sad) (A-Za-z0-9)||\.|~|_|(%0-9A-F{2}))',
so it does not allow some special characters like slash ("/"), which is used in DSpace handles. These characters must be escaped or replaced. Currently I have replaced "/" by "-".

...

Comments (RLR):

The programmatic way DSpace accesses bitstreams and metadata is very different. Bitstreams are treated as opaque simple objects
(although a few additional properties are required like a checksum). There is already some preliminary work on creating a clean abstraction to the underlying storage system (see http://wiki.dspace.org/index.php/PluggableStorage

Image Removed

). I would recommend starting with this 'Bitstore' interface, since it will be incorporated into DSpace+1.6, and already supports several storage back-ends: filesystem, Storage Resource Broker, Amazon S3, and Sun's HoneyComb. The last 2 are essentially http client calls, so they already resemble using the Fedora SOAP API.

...

Currently implemented driver actually is a combined DAOs (http://wiki.dspace.org/index.php/DAO+PrototypeImage Removed) and BitStore (http://wiki.dspace.org/index.php/PluggableStorageImage Removed) interfaces implementation. It can be used as both: DAO implementation or much more simplier standalone BitStore implementation. Actually, FedoraDAOs directly utilizes FedoraBitStore, bypassing BitstreamStorageManager.

Image RemovedImage Added

Driver allows store and retrieve Bitstreams, while metadata is only stored in Fedora. Relations are also preserved between Fedora objects using RELS-EXT.

...

  • Policy mapping implementation (user management service must be created to associate policy with users and groups?).

...