Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

Title:

...

DSpace2

...

storage-fedora

...

module

...

implementation

...

(Initially:

...

Fedora

...

DAO

...

implementation

...

for

...

DSpace,

...

beta

...

release)

...

Student:

...

Andrius

...

Blažinskas

...

Mentor:

...

Richard

...

Rodgers

About

Project DSpace2 storage-fedora

...

module

...

implementation

...

is

...

a

...

storage

...

module

...

allowing

...

DSpace

...

store

...

its

...

data

...

to

...

Fedora

...

repository.

...

Targeted

...

versions

...

are

...

DSpace

...

2.x

...

and

...

Fedora

...

3.x

...

(during

...

development

...

Fedora

...

3.2.1

...

was

...

used).

...

After

...

discussion

...

with

...

community

...

members,

...

it

...

was

...

decided

...

to

...

abandon

...

GSOC2008

...

work

...

on

...

DSpace

...

1.x

...

(

...

DSpace

...

&

...

Fedora Integration) and continue this work on DSpace 2.x. The data model in DSpace 2.x is different so mapping part was remade. The same way code heavily reorganized to reflect changes and to prepare it as DSpace 2 module.

Development plan/progress

  • In-depth analysis of DSpace 2 data model and the possibilities of mapping it with Fedora 3 model. (Done)
  • DSpace & Fedora model mapping design: basic mapping. (Done, but mapping will evolve)
  • Mapping implementation (Done, however some minor fixes are needed).
    • StorageVersionable implementation for Fedora3 (on TODO list)
  • Creation of tests (Done, however some extensions are being created)
  • Creation of documentation (Done)

DSpace 2 data model

Center

Image Added

Figure 1: General DSpace 2 data model

Integration|http://wiki.dspace.org/confluence/display/DSPACE/Google+Summer+of+Code+2008+Fedora+Integration]) and continue this work on DSpace 2.x. The data model in DSpace 2.x is different so mapping part was remade. The same way code heavily reorganized to reflect changes and to prepare it as DSpace 2 module. h1. Development plan/progress * In-depth analysis of DSpace 2 data model and the possibilities of mapping it with Fedora 3 model. (Done) * DSpace & Fedora model mapping design: basic mapping. (Done, but mapping will evolve) * Mapping implementation (Done, however some minor fixes are needed). ** StorageVersionable implementation for Fedora3 (on TODO list) * Creation of tests (Done, however some extensions are being created) * Creation of documentation (Done) h1. DSpace 2 data model {center} !General-DSpace2-data-model-1.jpg! Figure 1: General DSpace 2 data model

(http://smartech.gatech.edu/dspace/bitstream/1853/28078/5/214-578-1-PB.pdf)

{center} {center} !Example1-DSpace2-data-model-impl-1.jpg! Figure 2: Example DSpace 2 data model implementation

Center

Image Added

Figure 2: Example DSpace 2 data model implementation (http://smartech.gatech.edu/dspace/bitstream/1853/28078/5/214-578-1-PB.pdf)

{center} h1. Model mapping {center} !DSpace2-Fedora3-model-mapping-1.jpg! Figure 3: Proposed model mapping {center} Mapping notes: * Entity type is identified using general predicate

Model mapping

Center

Image Added

Figure 3: Proposed model mapping

Mapping notes:

...

  • For

...

  • now,

...

  • literal

...

  • FedoraObjectDatastream

...

  • used

...

  • to

...

  • indicate

...

  • mapping

...

  • to

...

  • datastream.

...

  • Any

...

  • binary

...

  • (file)

...

  • properties

...

  • are

...

  • unmapped,

...

  • unless

...

  • they

...

  • are

...

  • located

...

  • in

...

  • FedoraObjectDatastream

...

  • entity

...

  • and

...

  • has

...

  • name

...

...

  • Only

...

  • one

...

  • such

...

  • property

...

  • allowed

...

  • per

...

  • FedoraObjectDatastream

...

  • entity.

...

  • In

...

  • diagram,

...

  • relations

...

  • between

...

  • objects

...

  • indicated

...

  • using

...

  • info:fedora/fedora-system:def/relations-external#hasMember/isMemberOf

...

  • predicates,

...

  • however

...

  • other

...

  • custom

...

  • predicates

...

  • also

...

  • possible

...

  • and

...

  • will

...

  • be

...

  • literally

...

  • transferred

...

  • if

...

  • provided.

...

  • Datastream

...

  • dependence

...

  • to

...

  • particular

...

  • Fedora

...

  • object

...

  • must

...

  • be

...

  • indicated

...

  • using

...

  • info:fedora/fedora-system:def/view#hasDatastream

...

  • predicate.

...

  • Such

...

  • relations

...

  • between

...

  • FedoraObjectDatastream

...

  • entities

...

  • are

...

  • not

...

  • allowed.

...

  • String

...

  • properties

...

  • provided

...

  • without

...

  • namespace

...

  • are

...

  • assigned

...

  • default

...

...

  • namespace.

...

  • Any

...

  • property

...

  • starting

...

  • with

...

...

  • will

...

  • end

...

  • up

...

  • in

...

  • DC

...

  • datastream.

...

  • Datastream

...

  • info:fedora/fedora-system:def/view#mimeType

...

  • and

...

  • Format

...

  • entity

...

...

  • are

...

  • managed

...

  • separately,

...

  • however

...

  • they

...

  • should

...

  • be

...

  • the

...

  • same.

...

  • Fedora

...

  • object

...

  • label

...

  • indicated

...

  • using

...

  • info:fedora/fedora-system:def/model#label

...

  • and

...

  • datastream

...

  • label

...

  • (for

...

  • now)

...

  • -

...

...

  • Easy

...

  • notable

...

  • in

...

  • DSpace2

...

  • code,

...

  • however

...

  • no

...

  • direct

...

  • alternative

...

  • in

...

  • Fedora

...

  • having

...

  • entity

...

  • location,

...

  • will

...

  • be

...

  • put

...

  • in

...

  • RELS-EXT

...

  • as

...

  • separate

...

...

  • (yet

...

  • "invented")

...

  • metadata

...

  • field.

...

Other

...

potentially

...

useful

...

Fedora

...

predicates

...

to

...

be

...

implemented:

...

  • info:fedora/fedora-system:def/view#lastModifiedDate

...

  • -

...

  • to

...

  • retrieve

...

  • object

...

  • modification

...

  • date

...

  • info:fedora/fedora-system:def/view#version

...

  • -

...

  • to

...

  • retrieve

...

  • datastream

...

  • version,

...

  • as

...

  • versioning

...

  • to

...

  • be

...

  • enabled

...

  • info:fedora/fedora-system:def/view#disseminates

...

  • and

...

  • #disseminationType

...

  • -

...

  • to

...

  • define

...

  • more

...

  • advanced

...

  • dissemination

...

  • services?

...

  • info:fedora/fedora-system:def/model#ownerId

...

  • -

...

  • set/get

...

  • owner

...

  • info:fedora/fedora-system:def/model#altIds

...

  • -

...

  • set/get

...

  • alternate

...

  • ids

...

  • info:fedora/fedora-system:def/model#digest

...

  • and

...

  • #digestType

...

  • -

...

  • set?/get

...

  • digest

...

  • info:fedora/fedora-system:def/model#state

...

  • -

...

  • manage

...

  • state

...

  • (info:fedora/fedora-system:def/model#Active

...

  • /

...

  • #Inactive

...

  • /

...

  • #Deleted)

...

  • info:fedora/fedora-system:def/model#createdDate

...

  • -

...

  • to

...

  • retrieve

...

  • creation

...

  • date

...

  • info:fedora/fedora-system:def/model#contentModel

...

  • -

...

  • defining

...

  • more

...

  • specific

...

  • content

...

  • model?

...

  • info:fedora/fedora-system:def/model#length

...

  • -

...

  • length?

...

  • Others?...

...

Entities

DSpace 2 data model entities "marked"

...

with

...

property

...

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

...

=

...

info:fedora/fedora-system:def/model#FedoraObject

...

are

...

mapped

...

to

...

Fedora

...

objects.

...

Entities

...

having

...

property

...

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

...

=

...

FedoraObjectDatastream

...

are

...

indirectly

...

mapped

...

(binary

...

property

...

has

...

direct

...

datastream

...

mapping)

...

to

...

Fedora

...

objects

...

datastreams.

...

Entities

...

having

...

no

...

#type

...

property,

...

by

...

default

...

are

...

mapped

...

to

...

Fedora

...

objects.

...

Datastream

...

dependence

...

to

...

object

...

is

...

indicated

...

using

...

info:fedora/fedora-system:def/recovery#pid

...

property.

...


All

...

necessary

...

administrative

...

Fedora

...

object

...

and

...

datastream

...

properties

...

are

...

taken

...

from

...

corresponding

...

entity

...

properties.

...

If

...

multiple

...

properties

...

with

...

same

...

name

...

exist

...

and

...

only

...

one

...

is

...

needed

...

-

...

first

...

one

...

is

...

taken.

...

HTML Comment
hiddentrue

<!--

...


Datastream

...

dependence

...

to

...

object

...

is

...

indicated

...

using

...

info:fedora/fedora-system:def/view#hasDatastream

...

relation.

...

Datastream

...

entites

...

must

...

have

...

exactly

...

one

...

file

...

(binary

...

type)

...

property

...

(datastream

...

itself).

...

Format

...

type

...

entities

...

having

...

http://www.w3.org/1999/02/22-rdf-syntax-ns#type

...

=

...

http://purl.org/dspace/model#Format

...

property

...

are

...

mapped

...

to

...

Fedora

...

objects.

...

Its

...

RELS-EXT

...

is

...

supplemented

...

with

...

later

...

property

...

for

...

fast

...

supported

...

formats

...

listing

...

(possibly

...

in

...

DSpace

...

UI,

...

when

...

user

...

needs

...

to

...

select

...

mimetype

...

for

...

file).

...


-->

...

Properties

Properties of DSpace 2 entities are mapped to Fedora RELS-EXT,

...

RELS-INT,

...

DC

...

datastream

...

entries

...

and

...

separate

...

datastreams.

...

If

...

property

...

has

...

name

...

http://purl.org/dspace/model#ContentFile,

...

is

...

binary

...

type

...

(InputStream

...

java

...

class)

...

and

...

is

...

located

...

in

...

FedoraObjectDatastream

...

entity,

...

then

...

it

...

will

...

directly

...

result

...

as

...

a

...

datastream.

...

Only

...

one

...

http://purl.org/dspace/model#ContentFile

...

property

...

is

...

allowed

...

per

...

FedoraObjectDatastream

...

entity.

...

Any

...

string

...

property

...

starting

...

with

...

http://purl.org/dc/elements

...

or

...

http://www.openarchives.org/OAI/2.0/oai_dc/

...

will

...

end

...

up

...

in

...

DC

...

datastream.

...

Any

...

other

...

non

...

DC

...

and

...

non

...

administrative

...

(administravite

...

starts

...

with

...

info:fedora)

...

string

...

property

...

will

...

go

...

into

...

RELS-EXT

...

for

...

FedoraObject

...

entities

...

and

...

RELS-INT

...

for

...

FedoraObjectDatastream

...

entities.

...


String

...

properties

...

can

...

be

...

freely

...

defined

...

by

...

user

...

which

...

may

...

not

...

provide

...

namespace,

...

so

...

in

...

such

...

cases

...

"local"

...

namespace

...

http://localhost/model#

...

will

...

be

...

forced.

...

Relations

Relations between DSpace2 FedoraObject entities are directly mapped to Fedora relations between objects, which in turn are put in RELS-EXT datastream. Relations pointing from datastreams are defined in RELS-INT. In diagram, relation info:fedora/fedora-system:def/relations-external#hasDatastream

...

has

...

no

...

direct

...

mapping

...

and

...

currently

...

does

...

not

...

participate

...

in

...

any

...

way.

...

Using

...

current

...

mapping,

...

DSpace2

...

relations

...

in

...

Fedora

...

generally

...

can

...

result

...

in

...

any

...

combination:

...

object-to-object,

...

object-to-other-object-datastream

...

(in

...

RELS-EXT);

...

datastream-to-datastream,

...

datastream-to-object

...

(in

...

RELS-INT),

...

etc.

...

While

...

relations

...

between

...

datastreams

...

in

...

different

...

objects

...

may

...

not

...

be

...

very

...

correct,

...

it

...

is

...

left

...

up

...

for

...

the

...

user

...

to

...

choose

...

the

...

resulting

...

model

...

implementation

...

specifics

...

including

...

relation

...

types.

...

Where

...

are

...

a

...

lot

...

of

...

relations

...

types

...

defined

...

out

...

there,

...

but

...

in

...

storage-fedora

...

module

...

they

...

can

...

also

...

be

...

freely

...

defined

...

by

...

user.

...

If

...

namespace

...

is

...

not

...

provided

...

for

...

particular

...

relation

...

type,

...

local

...

namespace

...

http://localhost/model#

...

will

...

be

...

forced.

...

Example

...

of

...

child

...

objects

...

RELS-EXT

...

content

...

fragments:

{
Code Block
}
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="info:fedora/dspace:Book~1">
    <locatedIn xmlns="http://localhost/model#" rdf:resource="info:fedora/dspace:Library~1"/>
 </rdf:Description>
</rdf:RDF>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <rdf:Description rdf:about="info:fedora/dspace:Book~2">
    <locatedIn xmlns="http://localhost/model#" rdf:resource="info:fedora/dspace:Library~1"/>
  </rdf:Description>
</rdf:RDF>
{code}

Example

...

ITQL

...

query

...

for

...

fast

...

child

...

selection

...

(Fedora

...

resource

...

index

...

must

...

be

...

turned

...

on):

{
Code Block
}
select $subject from <#ri>
where  $subject <http://localhost/model#locatedIn>
   <info:fedora/dspace:Library~1>
{code}

Example

...

CSV

...

response

...

to

...

it:

{
Code Block
}
"subject"
info:fedora/dspace:Book~1
info:fedora/dspace:Book~2
{code}

When

...

designing

...

DSpace2

...

model

...

implementation,

...

designer

...

(user)

...

should

...

also

...

keep

...

in

...

mind,

...

that

...

entities

...

relations

...

pointing

...

from

...

parent

...

to

...

child

...

can

...

be

...

inefficient,

...

since

...

parent

...

entities

...

usually

...

tend

...

to

...

have

...

a

...

lot

...

of

...

child

...

entities

...

(consider

...

the

...

example

...

of

...

parent

...

Library

...

and

...

child

...

Book

...

above).

...

If

...

parent

...

references

...

all

...

of

...

its

...

children,

...

parent

...

Fedora

...

object

...

will

...

possibly

...

have

...

large

...

rapidly

...

changing

...

and

...

growing

...

number

...

of

...

RELS-EXT

...

entries.

...

This

...

problem

...

does

...

not

...

arise

...

in

...

child

...

to

...

parent

...

referencing.

...

HTML Comment
hiddentrue

<!--

...


There

...

are

...

some

...

things

...

to

...

note,

...

which

...

user

...

must

...

keep

...

in

...

mind

...

creating

...

relations

...

in

...

DSpace2

...

model

...

implementation.

...

DSpace

...

2

...

model

...

may

...

have

...

various

...

relation

...

types

...

between

...

entities,

...

for

...

example:

...

"hasBook",

...

"hasFile",

...

"isResearcherAt",

...

"scannedBy".

...

In

...

general,

...

if

...

parent

...

entity

...

has

...

relation

...

to

...

child

...

entity,

...

then

...

this

...

relation

...

can

...

be

...

called

...

"hasChild"

...

and

...

from

...

child

...

perspective

...

it

...

may

...

be

...

"isChildOf".

...

So

...

basically

...

child

...

can

...

have

...

reference

...

in

...

its

...

RELS-EXT

...

to

...

parent

...

the

...

same

...

way

...

parent

...

may

...

have

...

reference

...

in

...

its

...

RELS-EXT

...

to

...

child.

...

Problematic

...

is

...

the

...

second

...

case,

...

because

...

parent

...

entities

...

usually

...

tend

...

to

...

have

...

a

...

lot

...

of

...

child

...

entities

...

(consider

...

the

...

example

...

of

...

parent

...

Library

...

and

...

child

...

Book

...

above),

...

thus

...

if

...

it

...

references

...

all

...

of

...

its

...

children,

...

parent

...

object

...

will

...

possibly

...

have

...

rapidly

...

changing

...

and

...

growing

...

number

...

of

...

RELS-EXT

...

entries,

...

which

...

may

...

be

...

inefficient.

...

This

...

problem

...

does

...

not

...

arise

...

in

...

child

...

to

...

parent

...

referencing.

...

In

...

this

...

DSpace2-Fedora3

...

model

...

mapping,

...

it

...

is

...

proposed

...

that

...

if

...

not

...

defined

...

separately

...

by

...

user,

...

Fedora

...

objects

...

(represented

...

entities)

...

by

...

default

...

will

...

be

...

related

...

with

...

directional

...

child-to-parent

...

relation,

...

despite

...

relation

...

name.

...


-->

...

Identifiers

It is very likely, that organizations using Fedora, may prefer using their custom Fedora objects PIDs and DSIDs (datastream IDs), so implemented storage-fedora module does allow this functionality. User himself must ensure uniqueness of custom identifiers. DSpace entity identifier must have form of info:fedora/PID for objects and info:fedora/PID/DSID

...

for

...

datastreams,

...

so

...

that

...

it

...

can

...

be

...

interpreted

...

correctly

...

by

...

storage-fedora

...

module.

...

Incorrect

...

entity

...

identifier

...

(incompatible

...

with

...

Fedora

...

resource

...

URI)

...

will

...

result

...

in

...

error.

...

If

...

Fedora

...

object

...

or

...

datastream

...

identifier

...

in

...

not

...

provided

...

-

...

one

...

will

...

be

...

generated

...

automatically.

...

HTML Comment
hiddentrue

<!--

...


It

...

is

...

very

...

likely,

...

that

...

organizations

...

using

...

Fedora,

...

may

...

prefer

...

using

...

their

...

custom

...

Fedora

...

objects

...

PIDs

...

and

...

DSIDs

...

(datastream

...

IDs),

...

so

...

it

...

is

...

proposed

...

that

...

in

...

storage-fedora

...

module

...

Fedora

...

objects

...

(mapped

...

DSpace2

...

entities)

...

identifiers

...

can

...

be

...

configurable

...

by

...

user.

...

In

...

this

...

case,

...

user

...

himself

...

must

...

ensure

...

uniqueness

...

of

...

custom

...

identifiers.

...

Also

...

there

...

will

...

be

...

a

...

mechanism

...

allowing

...

generating

...

default

...

PIDs

...

and

...

DSIDs

...

without

...

user

...

intervention.

...


-->

...

Fedora PID namespace,

...

used

...

for

...

automatic

...

PID

...

generation,

...

is

...

configurable

...

and

...

predefined

...

in

...

storage-fedora

...

module

...

configuration

...

file.

...

Concerned about having pids contain any semantic meaning, discussions to date concerned having pids always be opaque to the application, the best example to support this would be the usage of uuids or fedora ids out of the box. please be cautious about the proposed usage above. Use of other properties will be more appropriate to determine the object type from (rdf:type or dc:type for instance). --Mark Diggory 22:37, 12 July 2009 (EDT) |

Identifiers having form <namespace>:<Entity name>~<UUID> and <namespace>:<UUID> were decided not to be used, thus removed from wiki. Though UUIDs are quite attractive and possibly will have more attention in future. --Andrius Blažinskas 00:46, 30 July 2009 (GMT+2) |

Versioning

Datastream versioning is important feature in Fedora what DSpace 2 could take advantage of. Fedora can version all datastreams, so basically both - binary files and RELS-EXT & RELS-INT (DSpace metadata and relations) can be versioned. The problem here is that a lot of time scattered changes in one datastream will result in lot of its copies, because Fedora simply keeps every changed version. This can be complicated when datastreams are relatively big and change rapidly.

Work on versioning for storage-fedora currently is in progress.

Where if REL-EXT supports versioning, then the majority of encoded DSpace metadata and relationships would be versioned as a unit for each DSpace Object. --Mark Diggory 22:41, 12 July 2009 (EDT) |

Implementation details

storage-fedora module is implemented in similar way storage-jackrabbit is. Currently module implements org.dspace.providers.StorageProvider,

...

org.dspace.services.mixins.

...

StorageWriteable/StorageVersionable

...

and

...

org.dspace.kernel.mixins.ShutdownService.

...


Most

...

recent

...

code

...

of

...

storage-fedora

...

will

...

be

...

available

...

at

...

http://scm.dspace.org/svn/repo/modules/storage-fedora/.

...

Comments

DSpace+2.0

...

Developer

...

Recommendations

...

We

...

propose

...

using

...

RELS-EXT

...

to

...

store

...

the

...

majority

...

of

...

DSpace

...

Properties

...

and

...

Relations

...

for

...

a

...

DSpace+2.0

...

Entity.

...

The

...

Goal

...

we

...

are

...

hope

...

to

...

see

...

attained

...

is

...

to

...

have

...

DSpace

...

2.0

...

act

...

as

...

a

...

Management

...

Toll

...

on

...

exisitng

...

Fedora

...

Repository

...

Content

...

that

...

may

...

have

...

not

...

come

...

from

...

DSpace

...

in

...

the

...

first

...

place,

...

this

...

means

...

  1. No

...

  1. DSpace

...

  1. centric

...

  1. metadata

...

  1. formats

...

  1. stored

...

  1. in

...

  1. separate

...

  1. bitstreams

...

  1. Use

...

  1. of

...

  1. RELS-EXT

...

  1. for

...

  1. all

...

  1. relations

...

  1. in

...

  1. DSpace+2.0

...

  1. Use

...

  1. of

...

  1. dc

...

  1. metadata

...

  1. datastream

...

  1. for

...

  1. any

...

  1. Dublic

...

  1. Core

...

  1. Elements

...

  1. Use

...

  1. of

...

  1. RELS-EXT

...

  1. for

...

  1. any

...

  1. other

...

  1. metadata

...

  1. properties

...

  1. Use

...

  1. of

...

  1. RELS-INT

...

  1. to

...

  1. identify

...

  1. relationships

...

  1. that

...

  1. are

...

  1. data

...

  1. files

...

Consider

...

that

...

there

...

are

...

efforts

...

to

...

map

...

Fedora

...

to

...

JCR

...

and

...

we

...

should

...

consider

...

these

...

in

...

the

...

approriate

...

mappings

...

to

...

DSPace

...

2.0

...

/

...

JCR

...

and

...

Fedora

...

(I

...

will

...

try

...

to

...

add

...

more

...

detail

...

on

...

this

...

shortly)

...

--

...

Mark

...

Diggory

...

16:16,

...

12

...

July

...

2009

...

(EDT)

...

''Caution

...

against

...

the

...

use

...

of

...

the

...

following

...

expressed

...

namespace

...

"http://purl.org/dspace2/model/relations/local"

...

the

...

relations

...

already

...

have

...

their

...

own

...

namespace

...

appropriate

...

(FoaF,

...

ORE,

...

DCMI,

...

etc).

...

The

...

only

...

place

...

that

...

a

...

"dspace"

...

specific

...

namespace

...

will

...

probably

...

be

...

employed

...

in

...

DSpace+2.0

...

is

...

to

...

capture

...

cases

...

where

...

legacy

...

DSpace

...

data

...

model

...

cannot

...

be

...

mapped

...

explicitly

...

to

...

an

...

already

...

existing

...

ontology

...

from

...

one

...

of

...

the

...

various

...

communities.

...

--

...

Mark

...

Diggory

...

22:35,

...

12

...

July

...

2009

...

(EDT)

...

References

DSpace2 model and demo by Ben Bosman: http://smartech.gatech.edu/dspace/handle/1853/28078,

...

http://presentations.dlpe.gatech.edu/or09/or09_052009_3/index.html

...

DSpace2

...

RDF:

...

http://wiki.dspace.org/index.php/DSpace+2.0/Expressing_DSpace_Domain_Model_In_RDF

...

JCR

...

for

...

Fedora

...

mappings:

...

http://jcr-connect.at.northwestern.edu/en/JCR_for_Fedora_-_Discussion

...

Project

...

code

...

is

...

available

...

at:

...

http://scm.dspace.org/svn/repo/modules/storage-fedora

...