...
After discussion with community members, it was decided to abandon GSOC2008 work on DSpace 1.x (DSpace & Fedora Integration) and continue this work on DSpace 2.x. The data model in DSpace 2.x is different so mapping part was remade. The same way code heavily reorganized to reflect changes and to prepare it as DSpace 2 module.
...
- In-depth analysis of DSpace 2 data model and the possibilities of mapping it with Fedora 3 model. (Done)
- DSpace & Fedora model mapping design: basic mapping. (Done, but mapping will evolve)
- Mapping implementation (Done, however some minor fixes are needed).
- StorageVersionable implementation for Fedora3 (In progress...on TODO list)
- Creation of tests (Done, however some extensions are being created)
- Creation of documentation (Done)
DSpace 2 data model
Center |
---|
Figure 1: General DSpace 2 data model |
...
Model mapping
...
(http://smartech.gatech.edu/dspace/bitstream/1853/28078/5/214-578-1-PB.pdf) |
Center |
---|
Figure 2: Example DSpace 2 data model implementation (http://smartech.gatech.edu/dspace/bitstream/1853/28078/5/214-578-1-PB.pdf) |
Model mapping
Center |
---|
Figure 3: Proposed model mapping |
Mapping notes:
- Entity type is identified using general predicate http://www.w3.org/1999/02/22-rdf-syntax-ns#type
. For now, literal FedoraObjectDatastream used to indicate mapping to datastream.
- Any binary (file) properties are unmapped, unless they are located in FedoraObjectDatastream entity and has name http://purl.org/dspace/model#ContentFile
. Only one such property allowed per FedoraObjectDatastream entity.
- In diagram, relations between objects indicated using info:fedora/fedora-system:def/relations-external#hasMember/isMemberOf predicates, however other custom predicates also possible and will be literally transferred if provided.
- Datastream dependence to particular Fedora object must be indicated using info:fedora/fedora-system:def/view#hasDatastream predicate. Such relations between FedoraObjectDatastream entities are not allowed.
- String properties provided without namespace are assigned default http://local/properties#
namespace.
- Any property starting with http://purl.org/dc/elements
will end up in DC datastream.
- Datastream info:fedora/fedora-system:def/view#mimeType and Format entity http://purl.org/dspace/model#mimetype
are managed separately, however they should be the same.
- Fedora object label indicated using info:fedora/fedora-system:def/model#label and datastream label (for now) - http://www.w3.org/2000/01/rdf-schema#label
.
- Easy notable in DSpace2 code, however no direct alternative in Fedora having entity location, will be put in RELS-EXT as separate http://purl.org/dspace/model#EntityLocation
(yet "invented") metadata field.
...
DSpace 2 data model entities "marked" with property http://www.w3.org/1999/02/22-rdf-syntax-ns#type = info:fedora/fedora-system:def/model#FedoraObject are mapped to Fedora objects. Entities having property http://www.w3.org/1999/02/22-rdf-syntax-ns#type
= FedoraObjectDatastream are indirectly mapped (binary property has direct datastream mapping) to Fedora objects datastreams. Entities having no #type property, by default are mapped to Fedora objects. Datastream dependence to object is indicated using info:fedora/fedora-system:def/recovery#pid property.
All necessary administrative Fedora object and datastream properties are taken from corresponding entity properties. If multiple properties with same name exist and only one is needed - first one is taken.
HTML Comment | ||
---|---|---|
| ||
<!-- Format type entities having http://www.w3.org/1999/02/22-rdf-syntax-ns#type |
![](/images/icons/linkext7.gif)
![](/images/icons/linkext7.gif)
property are mapped to Fedora objects. Its RELS-EXT is supplemented with later property for fast supported formats listing (possibly in DSpace UI, when user needs to select mimetype for file). |
Properties
Properties of DSpace 2 entities are mapped to Fedora RELS-EXT, RELS-INT, DC datastream entries and separate datastreams. If property has name http://purl.org/dspace/model#ContentFile, is binary type (InputStream java class) and is located in FedoraObjectDatastream entity, then it will directly result as a datastream. Only one http://purl.org/dspace/model#ContentFile
property is allowed per FedoraObjectDatastream entity. Any string property starting with http://purl.org/dc/elements
or http://www.openarchives.org/OAI/2.0/oai_dc/
will end up in DC datastream. Any other non DC and non administrative (administravite starts with info:fedora) string property will go into RELS-EXT for FedoraObject entities and RELS-INT for FedoraObjectDatastream entities.
String properties can be freely defined by user which may not provide namespace, so in such cases "local" namespace http://localhost/model# will be forced.
...
Where are a lot of relations types defined out there, but in storage-fedora module they can also be freely defined by user. If namespace is not provided for particular relation type, local namespace http://localhost/model# will be forced.
...
When designing DSpace2 model implementation, designer (user) should also keep in mind, that entities relations pointing from parent to child can be inefficient, since parent entities usually tend to have a lot of child entities (consider the example of parent Library and child Book above). If parent references all of its children, parent Fedora object will possibly have large rapidly changing and growing number of RELS-EXT entries. This problem does not arise in child to parent referencing.
HTML Comment | ||
---|---|---|
| ||
<!-- In this DSpace2-Fedora3 model mapping, it is proposed that if not defined separately by user, Fedora objects (represented entities) by default will be related with directional child-to-parent relation, despite relation name. |
Identifiers
It is very likely, that organizations using Fedora, may prefer using their custom Fedora objects PIDs and DSIDs (datastream IDs), so implemented storage-fedora module does allow this functionality. User himself must ensure uniqueness of custom identifiers. DSpace entity identifier must have form of info:fedora/PID for objects and info:fedora/PID/DSID for datastreams, so that it can be interpreted correctly by storage-fedora module. Incorrect entity identifier (incompatible with Fedora resource URI) will result in error. If Fedora object or datastream identifier in not provided - one will be generated automatically.
HTML Comment | ||
---|---|---|
| ||
<!-- |
Fedora PID namespace, used for automatic PID generation, is configurable and predefined in storage-fedora module configuration file.
...
storage-fedora module is implemented in similar way storage-jackrabbit is. Currently module implements org.dspace.providers.StorageProvider, org.dspace.services.mixins. StorageWriteable/StorageVersionable and org.dspace.kernel.mixins.ShutdownService.
Most recent code of storage-fedora will be available at http://scm.dspace.org/svn/repo/modules/storage-fedora/.
Comments
DSpace+2.0 Developer Recommendations
We propose using RELS-EXT to store the majority of DSpace Properties and Relations for a DSpace+2.0 Entity. The Goal we are hope to see attained is to have DSpace 2.0 act as a Management Toll on exisitng Fedora Repository Content that may have not come from DSpace in the first place, this means
...
Consider that there are efforts to map Fedora to JCR and we should consider these in the approriate mappings to DSPace 2.0 / JCR and Fedora (I will try to add more detail on this shortly) --Mark Diggory 16:16, 12 July 2009 (EDT)
...
DSpace2 model and demo by Ben Bosman: http://smartech.gatech.edu/dspace/handle/1853/28078, http://presentations.dlpe.gatech.edu/or09/or09_052009_3/index.html
DSpace2 RDF: http://wiki.dspace.org/index.php/DSpace+2.0/Expressing_DSpace_Domain_Model_In_RDF
JCR for Fedora mappings: http://jcr-connect.at.northwestern.edu/en/JCR_for_Fedora_-_Discussion
Project code is available at: http://scm.dspace.org/svn/repo/modules/storage-fedora