Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

DSpace Item entity contains associated Dublin Core qualified metadata XML file. Fedora does provide default datastream with DC identifier for Dublin Core metadata in every object, so it can be used to contain these fields.

<!--
Also it should be noted, that Item entity has two types of relation with Collection entity. In Fedora, simple relations between objects are expressed in RELS-EXT using isMemberOf relation type. However, custom relations can easily be introduced, so additional relation isIncludedBy is added here to emphasize inclusion rather than ownership. Not really sure if it is good to use custom relation, but it works. -->

Relations between mapped DSpace entities in Fedora can be found by searching resource index with ITQL queries. Such a query example:

...

Code Block
<?xml version="1.0" encoding="UTF-8" ?>
<sparql xmlns="http://www.w3.org/2001/sw/DataAccess/rf1/result">
<head>
  <variable name="object" />
</head>
<results>
  <result>
    <object uri="info:fedora/demo:Item~213.456-789" />
  </result>
  <result>
    <object uri="info:fedora/demo:Item~223.456-789" />
  </result>
</results>
</sparql>

...

The same way can be formed query for included Items:

Code Block
select $object from <#ri>
where  $object <fedora-rels-ext:isIncludedBy>
   <info:fedora/demo:Collection~123.456-789>

-->

More tricky situation is with DSpace Bitstreams. Basically, they are mapped to Fedora datastreams. When ingested, every Bitstream is put into separate temporary Fedora object. Later, when Bitstream is associated with any entity (Bundle, etc), it is transferred to this entity object as Fedora datastream. In some special cases, when bitstream is linked to several entities, Bitstream in Fedora is moved and kept in separate dedicated object, with relations in RELS-EXT to other parent entities.
This separate Bitstream object scenario also satisfies the case, when Fedora is used only to store Bitstreams and small associated metadata set, without preserving full model structure (only FedoraBitStore functionality). In this case, this object is not temporary but always permanent. However, the idea of one datastream (Bitstream) per one object still isn't that attractive...

...

Fedora objects (which represent DSpace entities) PIDs can possibly be formed using general pattern: <Fedora namespace ID>:<DSpace entity type>~<DSpace entity ID>. At the moment, DSpace entities IDs are internal DSpace identifiers (method getID() is used). Examples of IDs provided in Table 1.<!--

Fedora PID must satisfy pattern:

Code Block
*'(**\[A-Za-z0-9|A-Za-z0-9\]**\|*{*}-\|\.)

...

-*:-( *{-}+(+{-}{*}{*}{-}+\[A-Za-z0-9|A-Za-z0-9\\]+{-}{*}{*}{-}+)\|+{-}{*}+\|\.\|~\|_\|(%++\[0-9A-F|0-9A-F\]++\{2\}))+'

...

so So it does not allow some special characters like slash ("/"), which is used in DSpace handles. These characters must be escaped or replaced. Currently I have replaced "/" by "-".

Bundle identifier is formed combining parent Item handle and DSpace Bundle ID (possibly from database), separated by underscore symbol "_". -->

Fedora datastream, representing Bitstream, ID can be formed in similar way by using pattern: Bitstream.<Bitstream ID>, since symbol "~" is not allowed in datastreams IDs.

<!-- It is also possible to use Bitstream~313.456-789_7_24 as ID, but since part 313.456-789_7 will already be included in Fedora object (Bundle) ID, there is no need for replication.
-->

Panel
borderColor#ccc
bgColor#fff
borderStyledashed
titleTable 1: Identifiers
borderStyledashed

Fedora entity representing DSpace entity

ID pattern

ID example

Fedora Object (Community)

<Fedora namespace ID>: Community~<Community ID>

demo:Community~1

Fedora Object (Collection)

<Fedora namespace ID>: Collection~<Collection ID>

demo:Collection~1

Fedora Object (Item)

<Fedora namespace ID>: Item~<Item ID>

demo:Item~1

Fedora Object (Bundle)

<Fedora namespace ID>: Bundle~<Bundle ID>

demo:Bundle~1

Fedora Datastream (Bitstream)

Bitstream.<Bitstream ID>

Bitstream.1

...

I propose to create a driver prototype which will provide DSpace the possibility to access Fedora repository as a primary storage to store bitstreams and metadata. Driver classes will have the same method interfaces as current DSpace "org.dspace.storage" package classes and will be accessed in the same manner. Driver will communicate directly with Fedora repository using its SOAP API (API-A and API-M). <!--

To prevent software defects, all written code will be tested using JUnit. I will also provide code documentation. -->

Comments (RLR):

The programmatic way DSpace accesses bitstreams and metadata is very different. Bitstreams are treated as opaque simple objects
(although a few additional properties are required like a checksum). There is already some preliminary work on creating a clean abstraction to the underlying storage system (see http://wiki.dspace.org/index.php/PluggableStorage). I would recommend starting with this 'Bitstore' interface, since it will be incorporated into DSpace+1.6, and already supports several storage back-ends: filesystem, Storage Resource Broker, Amazon S3, and Sun's HoneyComb. The last 2 are essentially http client calls, so they already resemble using the Fedora SOAP API.

But the metadata is another story - DSpace does very little to abstract away from direct JDBC/SQL calls into a RDBMS. I think here the question of a 'driver' is less obvious, and you might want to explore a few designs before committing a lot of work. For example: could the metadata be placed in a bitstream and stored through the other driver? This is not a functional apping, but would satisfy e.g. a replication scenario. Should you attempt a high level metadata abstraction that bypasses current DSpace (but could be retrofitted into it)? Etc. I am just throwing out thoughts to elicit additional discussion here. |<!--

After initial analysis, the decision was made to start work from interface library (driver), which will allow managing basic DSpace model entities (Community, Collection, Item, etc.) in Fedora repository. This library will be independent from DSpace itself.
-->

<...>

Currently implemented driver actually is a combined DAOs (http://wiki.dspace.org/index.php/DAO+Prototype) and BitStore (http://wiki.dspace.org/index.php/PluggableStorage) interfaces implementation. It can be used as both: DAO implementation or much more simplier standalone BitStore implementation. Actually, FedoraDAOs directly utilizes FedoraBitStore, bypassing BitstreamStorageManager.

...