Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Title:

...

DSpace2

...

storage-fedora

...

module

...

implementation

...

(Initially:

...

Fedora

...

DAO

...

implementation

...

for

...

DSpace,

...

beta

...

release)

...

Student:

...

Andrius

...

Blažinskas

...

Mentor:

...

Richard

...

Rodgers

About

Project DSpace2 storage-fedora

...

module

...

implementation

...

is

...

a

...

storage

...

module

...

allowing

...

DSpace

...

store

...

its

...

data

...

to

...

Fedora

...

repository.

...

Targeted

...

versions

...

are

...

DSpace

...

2.x

...

and

...

Fedora

...

3.x

...

(during

...

development

...

Fedora

...

3.2.1

...

was

...

used).

...

After

...

discussion

...

with

...

community

...

members,

...

it

...

was

...

decided

...

to

...

abandon

...

GSOC2008

...

work

...

on

...

DSpace

...

1.x

...

(

...

DSpace

...

&

...

Fedora

...

Integration

...

)

...

and

...

continue

...

this

...

work

...

on

...

DSpace

...

2.x.

...

The

...

data

...

model

...

in

...

DSpace

...

2.x

...

is

...

different

...

so

...

mapping

...

part

...

was

...

remade.

...

The

...

same

...

way

...

code

...

heavily

...

reorganized

...

to

...

reflect

...

changes

...

and

...

to

...

prepare

...

it

...

as

...

DSpace

...

2

...

module.

...

Development plan/progress

...

  • In-depth

...

  • analysis

...

  • of

...

  • DSpace

...

  • 2

...

  • data

...

  • model

...

  • and

...

  • the

...

  • possibilities

...

  • of

...

  • mapping

...

  • it

...

  • with

...

  • Fedora

...

  • 3

...

  • model.

...

  • (Done)

...

  • DSpace

...

  • &

...

  • Fedora

...

  • model

...

  • mapping

...

  • design:

...

  • basic

...

  • mapping.

...

  • (Done,

...

  • but

...

  • mapping

...

  • will

...

  • evolve)

...

  • Mapping

...

  • implementation

...

  • (Done,

...

  • however

...

  • some

...

  • minor

...

  • fixes

...

  • are

...

  • needed).

...

    • StorageVersionable

...

    • implementation

...

    • for

...

    • Fedora3

...

    • (on

...

    • TODO

...

    • list)

...

  • Creation

...

  • of

...

  • tests

...

  • (Done,

...

  • however

...

  • some

...

  • extensions

...

  • are

...

  • being

...

  • created)

...

  • Creation

...

  • of

...

  • documentation

...

  • (Done)

...

DSpace 2 data model

Wiki Markup
 2 data model
{center}
!General-DSpace2-data-model-1.jpg!

Figure 1: General DSpace 2 data model (http://smartech.gatech.edu/dspace/bitstream/1853/28078/5/214-578-1-PB.pdf)
{center}


Wiki Markup
{center}
!Example1-DSpace2-data-model-impl-1.jpg!

Figure 2: Example DSpace 2 data model implementation (http://smartech.gatech.edu/dspace/bitstream/1853/28078/5/214-578-1-PB.pdf)
{center}

h1. 

Model mapping

Wiki Markup
 mapping
{center}
!DSpace2-Fedora3-model-mapping-1.jpg!

Figure 3: Proposed model mapping
{center}


Mapping

...

notes:

...

  • Entity

...

  • type

...

  • is

...

  • identified

...

  • using

...

  • general

...

  • predicate

...

...

  • For

...

  • now,

...

  • literal

...

  • FedoraObjectDatastream

...

  • used

...

  • to

...

  • indicate

...

  • mapping

...

  • to

...

  • datastream.

...

  • Any

...

  • binary

...

  • (file)

...

  • properties

...

  • are

...

  • unmapped,

...

  • unless

...

  • they

...

  • are

...

  • located

...

  • in

...

  • FedoraObjectDatastream

...

  • entity

...

  • and

...

  • has

...

  • name

...

...

  • Only

...

  • one

...

  • such

...

  • property

...

  • allowed

...

  • per

...

  • FedoraObjectDatastream

...

  • entity.

...

  • In

...

  • diagram,

...

  • relations

...

  • between

...

  • objects

...

  • indicated

...

  • using

...

  • info:fedora/fedora-system:def/relations-external#hasMember/isMemberOf

...

  • predicates,

...

  • however

...

  • other

...

  • custom

...

  • predicates

...

  • also

...

  • possible

...

  • and

...

  • will

...

  • be

...

  • literally

...

  • transferred

...

  • if

...

  • provided.

...

  • Datastream

...

  • dependence

...

  • to

...

  • particular

...

  • Fedora

...

  • object

...

  • must

...

  • be

...

  • indicated

...

  • using

...

  • info:fedora/fedora-system:def/view#hasDatastream

...

  • predicate.

...

  • Such

...

  • relations

...

  • between

...

  • FedoraObjectDatastream

...

  • entities

...

  • are

...

  • not

...

  • allowed.

...

  • String

...

  • properties

...

  • provided

...

  • without

...

  • namespace

...

  • are

...

  • assigned

...

  • default

...

...

  • namespace.

...

  • Any

...

  • property

...

  • starting

...

  • with

...

...

  • will

...

  • end

...

  • up

...

  • in

...

  • DC

...

  • datastream.

...

  • Datastream

...

  • info:fedora/fedora-system:def/view#mimeType

...

  • and

...

  • Format

...

  • entity

...

...

  • are

...

  • managed

...

  • separately,

...

  • however

...

  • they

...

  • should

...

  • be

...

  • the

...

  • same.

...

  • Fedora

...

  • object

...

  • label

...

  • indicated

...

  • using

...

  • info:fedora/fedora-system:def/model#label

...

  • and

...

  • datastream

...

  • label

...

  • (for

...

  • now)

...

  • -

...

...

  • Easy

...

  • notable

...

  • in

...

  • DSpace2

...

  • code,

...

  • however

...

  • no

...

  • direct

...

  • alternative

...

  • in

...

  • Fedora

...

  • having

...

  • entity

...

  • location,

...

  • will

...

  • be

...

  • put

...

  • in

...

  • RELS-EXT

...

  • as

...

  • separate

...

...

  • (yet

...

  • "invented")

...

  • metadata

...

  • field.

...

Other

...

potentially

...

useful

...

Fedora

...

predicates

...

to

...

be

...

implemented:

...

  • info:fedora/fedora-system:def/view#lastModifiedDate

...

  • -

...

  • to

...

  • retrieve

...

  • object

...

  • modification

...

  • date

...

  • info:fedora/fedora-system:def/view#version

...

  • -

...

  • to

...

  • retrieve

...

  • datastream

...

  • version,

...

  • as

...

  • versioning

...

  • to

...

  • be

...

  • enabled

...

  • info:fedora/fedora-system:def/view#disseminates

...

  • and

...

  • #disseminationType

...

  • -

...

  • to

...

  • define

...

  • more

...

  • advanced

...

  • dissemination

...

  • services?

...

  • info:fedora/fedora-system:def/model#ownerId

...

  • -

...

  • set/get

...

  • owner

...

  • info:fedora/fedora-system:def/model#altIds

...

  • -

...

  • set/get

...

  • alternate

...

  • ids

...

  • info:fedora/fedora-system:def/model#digest

...

  • and

...

  • #digestType

...

  • -

...

  • set?/get

...

  • digest

...

  • info:fedora/fedora-system:def/model#state

...

  • -

...

  • manage

...

  • state

...

  • (info:fedora/fedora-system:def/model#Active

...

  • /

...

  • #Inactive

...

  • /

...

  • #Deleted)

...

  • info:fedora/fedora-system:def/model#createdDate

...

  • -

...

  • to

...

  • retrieve

...

  • creation

...

  • date

...

  • info:fedora/fedora-system:def/model#contentModel

...

  • -

...

  • defining

...

  • more

...

  • specific

...

  • content

...

  • model?

...

  • info:fedora/fedora-system:def/model#length

...

  • -

...

  • length?

...

  • Others?...

...

Entities

DSpace 2 data model entities "marked"

...

with

...

property

...

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

...

=

...

info:fedora/fedora-system:def/model#FedoraObject

...

are

...

mapped

...

to

...

Fedora

...

objects.

...

Entities

...

having

...

property

...

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

...

=

...

FedoraObjectDatastream

...

are

...

indirectly

...

mapped

...

(binary

...

property

...

has

...

direct

...

datastream

...

mapping)

...

to

...

Fedora

...

objects

...

datastreams.

...

Entities

...

having

...

no

...

#type

...

property,

...

by

...

default

...

are

...

mapped

...

to

...

Fedora

...

objects.

...

Datastream

...

dependence

...

to

...

object

...

is

...

indicated

...

using

...

info:fedora/fedora-system:def/recovery#pid

...

property.

...


All

...

necessary

...

administrative

...

Fedora

...

object

...

and

...

datastream

...

properties

...

are

...

taken

...

from

...

corresponding

...

entity

...

properties.

...

If

...

multiple

...

properties

...

with

...

same

...

name

...

exist

...

and

...

only

...

one

...

is

...

needed

...

-

...

first

...

one

...

is

...

taken.

...

Wiki Markup
{HTMLcomment:hidden}
<!--
Datastream dependence to object is indicated using info:fedora/fedora-system:def/view#hasDatastream relation. Datastream entites must have exactly one file (binary type) property (datastream itself).

Format type entities having http://www.w3.org/1999/02/22-rdf-syntax-ns#type = http://purl.org/dspace/model#Format property are mapped to Fedora objects. Its RELS-EXT is supplemented with later property for fast supported formats listing (possibly in DSpace UI, when user needs to select mimetype for file).
-->
{HTMLcomment}

...

Properties

Properties of DSpace 2 entities are mapped to Fedora RELS-EXT,

...

RELS-INT,

...

DC

...

datastream

...

entries

...

and

...

separate

...

datastreams.

...

If

...

property

...

has

...

name

...

http://purl.org/dspace/model#ContentFile,

...

is

...

binary

...

type

...

(InputStream

...

java

...

class)

...

and

...

is

...

located

...

in

...

FedoraObjectDatastream

...

entity,

...

then

...

it

...

will

...

directly

...

result

...

as

...

a

...

datastream.

...

Only

...

one

...

http://purl.org/dspace/model#ContentFile

...

property

...

is

...

allowed

...

per

...

FedoraObjectDatastream

...

entity.

...

Any

...

string

...

property

...

starting

...

with

...

http://purl.org/dc/elements

...

or

...

http://www.openarchives.org/OAI/2.0/oai_dc/

...

will

...

end

...

up

...

in

...

DC

...

datastream.

...

Any

...

other

...

non

...

DC

...

and

...

non

...

administrative

...

(administravite

...

starts

...

with

...

info:fedora)

...

string

...

property

...

will

...

go

...

into

...

RELS-EXT

...

for

...

FedoraObject

...

entities

...

and

...

RELS-INT

...

for

...

FedoraObjectDatastream

...

entities.

...


String

...

properties

...

can

...

be

...

freely

...

defined

...

by

...

user

...

which

...

may

...

not

...

provide

...

namespace,

...

so

...

in

...

such

...

cases

...

"local"

...

namespace

...

http://localhost/model#

...

will

...

be

...

forced.

...

Relations

Relations between DSpace2 FedoraObject entities are directly mapped to Fedora relations between objects, which in turn are put in RELS-EXT datastream. Relations pointing from datastreams are defined in RELS-INT. In diagram, relation info:fedora/fedora-system:def/relations-external#hasDatastream

...

has

...

no

...

direct

...

mapping

...

and

...

currently

...

does

...

not

...

participate

...

in

...

any

...

way.

...

Using

...

current

...

mapping,

...

DSpace2

...

relations

...

in

...

Fedora

...

generally

...

can

...

result

...

in

...

any

...

combination:

...

object-to-object,

...

object-to-other-object-datastream

...

(in

...

RELS-EXT);

...

datastream-to-datastream,

...

datastream-to-object

...

(in

...

RELS-INT),

...

etc.

...

While

...

relations

...

between

...

datastreams

...

in

...

different

...

objects

...

may

...

not

...

be

...

very

...

correct,

...

it

...

is

...

left

...

up

...

for

...

the

...

user

...

to

...

choose

...

the

...

resulting

...

model

...

implementation

...

specifics

...

including

...

relation

...

types.

...

Where

...

are

...

a

...

lot

...

of

...

relations

...

types

...

defined

...

out

...

there,

...

but

...

in

...

storage-fedora

...

module

...

they

...

can

...

also

...

be

...

freely

...

defined

...

by

...

user.

...

If

...

namespace

...

is

...

not

...

provided

...

for

...

particular

...

relation

...

type,

...

local

...

namespace

...

http://localhost/model#

...

will

...

be

...

forced.

...

Example

...

of

...

child

...

objects

...

RELS-EXT

...

content

...

fragments:

{
Code Block
}
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="info:fedora/dspace:Book~1">
    <locatedIn xmlns="http://localhost/model#" rdf:resource="info:fedora/dspace:Library~1"/>
 </rdf:Description>
</rdf:RDF>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="info:fedora/dspace:Book~2">
    <locatedIn xmlns="http://localhost/model#" rdf:resource="info:fedora/dspace:Library~1"/>
  </rdf:Description>
</rdf:RDF>
{code}

Example

...

ITQL

...

query

...

for

...

fast

...

child

...

selection

...

(Fedora

...

resource

...

index

...

must

...

be

...

turned

...

on):

{
Code Block
}
select $subject from <#ri>
where  $subject <http://localhost/model#locatedIn>
   <info:fedora/dspace:Library~1>
{code}

Example

...

CSV

...

response

...

to

...

it:

{
Code Block
}
"subject"
info:fedora/dspace:Book~1
info:fedora/dspace:Book~2
{code}

When

...

designing

...

DSpace2

...

model

...

implementation,

...

designer

...

(user)

...

should

...

also

...

keep

...

in

...

mind,

...

that

...

entities

...

relations

...

pointing

...

from

...

parent

...

to

...

child

...

can

...

be

...

inefficient,

...

since

...

parent

...

entities

...

usually

...

tend

...

to

...

have

...

a

...

lot

...

of

...

child

...

entities

...

(consider

...

the

...

example

...

of

...

parent

...

Library

...

and

...

child

...

Book

...

above).

...

If

...

parent

...

references

...

all

...

of

...

its

...

children,

...

parent

...

Fedora

...

object

...

will

...

possibly

...

have

...

large

...

rapidly

...

changing

...

and

...

growing

...

number

...

of

...

RELS-EXT

...

entries.

...

This

...

problem

...

does

...

not

...

arise

...

in

...

child

...

to

...

parent

...

referencing.

...

Wiki Markup
{HTMLcomment:hidden}
<!--
There are some things to note, which user must keep in mind creating relations in DSpace2 model implementation. DSpace 2 model may have various relation types between entities, for example: "hasBook", "hasFile", "isResearcherAt", "scannedBy". In general, if parent entity has relation to child entity, then this relation can be called "hasChild" and from child perspective it may be "isChildOf". So basically child can have reference in its RELS-EXT to parent the same way parent may have reference in its RELS-EXT to child. Problematic is the second case, because parent entities usually tend to have a lot of child entities (consider the example of parent Library and child Book above), thus if it references all of its children, parent object will possibly have rapidly changing and growing number of RELS-EXT entries, which may be inefficient. This problem does not arise in child to parent referencing.

In this DSpace2-Fedora3 model mapping, it is proposed that if not defined separately by user, Fedora objects (represented entities) by default will be related with directional child-to-parent relation, despite relation name.
-->
{HTMLcomment}

...

Identifiers

It is very likely, that organizations using Fedora, may prefer using their custom Fedora objects PIDs and DSIDs (datastream IDs), so implemented storage-fedora module does allow this functionality. User himself must ensure uniqueness of custom identifiers. DSpace entity identifier must have form of info:fedora/PID

...

for

...

objects

...

and

...

info:fedora/PID/DSID

...

for

...

datastreams,

...

so

...

that

...

it

...

can

...

be

...

interpreted

...

correctly

...

by

...

storage-fedora

...

module.

...

Incorrect

...

entity

...

identifier

...

(incompatible

...

with

...

Fedora

...

resource

...

URI)

...

will

...

result

...

in

...

error.

...

If

...

Fedora

...

object

...

or

...

datastream

...

identifier

...

in

...

not

...

provided

...

-

...

one

...

will

...

be

...

generated

...

automatically.

...

Wiki Markup
{HTMLcomment:hidden}
<!--
It is very likely, that organizations using Fedora, may prefer using their custom Fedora objects PIDs and DSIDs (datastream IDs), so it is proposed that in storage-fedora module Fedora objects (mapped DSpace2 entities) identifiers can be configurable by user. In this case, user himself must ensure uniqueness of custom identifiers. Also there will be a mechanism allowing generating default PIDs and DSIDs without user intervention.
-->
{HTMLcomment}

...

Fedora

...

PID

...

namespace,

...

used

...

for

...

automatic

...

PID

...

generation,

...

is

...

configurable

...

and

...

predefined

...

in

...

storage-fedora

...

module

...

configuration

...

file.

...

Concerned about having pids contain any semantic meaning, discussions to date concerned having pids always be opaque to the application, the best example to support this would be the usage of uuids or fedora ids out of the box. please be cautious about the proposed usage above. Use of other properties will be more appropriate to determine the object type from (rdf:type or dc:type for instance). --Mark Diggory 22:37, 12 July 2009 (EDT) |

Identifiers having form <namespace>:<Entity name>~<UUID> and <namespace>:<UUID> were decided not to be used, thus removed from wiki. Though UUIDs are quite attractive and possibly will have more attention in future. --Andrius Blažinskas 00:46, 30 July 2009 (GMT+2) |

Versioning

Datastream versioning is important feature in Fedora what DSpace 2 could take advantage of. Fedora can version all datastreams, so basically both - binary files and RELS-EXT & RELS-INT (DSpace metadata and relations) can be versioned. The problem here is that a lot of time scattered changes in one datastream will result in lot of its copies, because Fedora simply keeps every changed version. This can be complicated when datastreams are relatively big and change rapidly.

Work on versioning for storage-fedora currently is in progress.

Where if REL-EXT supports versioning, then the majority of encoded DSpace metadata and relationships would be versioned as a unit for each DSpace Object. --Mark Diggory 22:41, 12 July 2009 (EDT) |

Implementation details

storage-fedora module is implemented in similar way storage-jackrabbit is. Currently module implements org.dspace.providers.StorageProvider,

...

org.dspace.services.mixins.

...

StorageWriteable/StorageVersionable

...

and

...

org.dspace.kernel.mixins.ShutdownService.

...


Most

...

recent

...

code

...

of

...

storage-fedora

...

will

...

be

...

available

...

at

...

http://scm.dspace.org/svn/repo/modules/storage-fedora/.

...

Comments

DSpace+2.0

...

Developer

...

Recommendations

...

We

...

propose

...

using

...

RELS-EXT

...

to

...

store

...

the

...

majority

...

of

...

DSpace

...

Properties

...

and

...

Relations

...

for

...

a

...

DSpace+2.0

...

Entity.

...

The

...

Goal

...

we

...

are

...

hope

...

to

...

see

...

attained

...

is

...

to

...

have

...

DSpace

...

2.0

...

act

...

as

...

a

...

Management

...

Toll

...

on

...

exisitng

...

Fedora

...

Repository

...

Content

...

that

...

may

...

have

...

not

...

come

...

from

...

DSpace

...

in

...

the

...

first

...

place,

...

this

...

means

...

  1. No

...

  1. DSpace

...

  1. centric

...

  1. metadata

...

  1. formats

...

  1. stored

...

  1. in

...

  1. separate

...

  1. bitstreams

...

  1. Use

...

  1. of

...

  1. RELS-EXT

...

  1. for

...

  1. all

...

  1. relations

...

  1. in

...

  1. DSpace+2.0

...

  1. Use

...

  1. of

...

  1. dc

...

  1. metadata

...

  1. datastream

...

  1. for

...

  1. any

...

  1. Dublic

...

  1. Core

...

  1. Elements

...

  1. Use

...

  1. of

...

  1. RELS-EXT

...

  1. for

...

  1. any

...

  1. other

...

  1. metadata

...

  1. properties

...

  1. Use

...

  1. of

...

  1. RELS-INT

...

  1. to

...

  1. identify

...

  1. relationships

...

  1. that

...

  1. are

...

  1. data

...

  1. files

...

Consider

...

that

...

there

...

are

...

efforts

...

to

...

map

...

Fedora

...

to

...

JCR

...

and

...

we

...

should

...

consider

...

these

...

in

...

the

...

approriate

...

mappings

...

to

...

DSPace

...

2.0

...

/

...

JCR

...

and

...

Fedora

...

(I

...

will

...

try

...

to

...

add

...

more

...

detail

...

on

...

this

...

shortly)

...

--

...

Mark

...

Diggory

...

16:16,

...

12

...

July

...

2009

...

(EDT)

...

''Caution

...

against

...

the

...

use

...

of

...

the

...

following

...

expressed

...

namespace

...

"http://purl.org/dspace2/model/relations/local"

...

the

...

relations

...

already

...

have

...

their

...

own

...

namespace

...

appropriate

...

(FoaF,

...

ORE,

...

DCMI,

...

etc).

...

The

...

only

...

place

...

that

...

a

...

"dspace"

...

specific

...

namespace

...

will

...

probably

...

be

...

employed

...

in

...

DSpace+2.0

...

is

...

to

...

capture

...

cases

...

where

...

legacy

...

DSpace

...

data

...

model

...

cannot

...

be

...

mapped

...

explicitly

...

to

...

an

...

already

...

existing

...

ontology

...

from

...

one

...

of

...

the

...

various

...

communities.

...

--

...

Mark

...

Diggory

...

22:35,

...

12

...

July

...

2009

...

(EDT)

...

References

DSpace2 model and demo by Ben Bosman: http://smartech.gatech.edu/dspace/handle/1853/28078,

...

http://presentations.dlpe.gatech.edu/or09/or09_052009_3/index.html

...

DSpace2

...

RDF:

...

http://wiki.dspace.org/index.php/DSpace+2.0/Expressing_DSpace_Domain_Model_In_RDF

...

JCR

...

for

...

Fedora

...

mappings:

...

http://jcr-connect.at.northwestern.edu/en/JCR_for_Fedora_-_Discussion

...

Project

...

code

...

is

...

available

...

at:

...

http://scm.dspace.org/svn/repo/modules/storage-fedora

...