Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Replication

...

Task

...

Suite

...

One

...

current

...

application

...

of

...

the

...

curation

...

system

...

is

...

a

...

related

...

set

...

(suite)

...

of

...

tasks

...

to

...

assist

...

in

...

performing

...

replication

...

of

...

DSpace

...

content

...

to

...

other

...

locations.

...

The

...

content

...

is

...

packaged

...

in

...

containers

...

known

...

as

...

AIPs

...

(OAIS

...

speak:

...

'archival

...

information

...

packages').

...

You

...

can

...

read

...

much

...

more

...

about

...

how

...

AIPs

...

are

...

constituted

...

here

...

: AIP

...

Backup

...

and

...

Restore

...

,

...

and

...

as

...

of

...

DSpace

...

1.7,

...

support

...

for

...

generating

...

AIPs

...

will

...

be

...

included.

...

This

...

discussion

...

also

...

presupposes

...

a

...

little

...

knowledge

...

of

...

the

...

DSpace

...

curation

...

system,

...

which

...

is

...

described

...

here:

...

CurationSystem

...

.

...

We

...

will

...

describe

...

a

...

concrete

...

situation

...

facing

...

a

...

repository

...

data

...

curator,

...

and

...

introduce

...

each

...

task

...

as

...

the

...

need

...

arises.

...

We

...

will

...

also

...

describe

...

some

...

of

...

the

...

technical

...

configuration

...

details

...

to

...

enable

...

these

...

tasks.

{:=
Info
title
Source
Code
is
available
}

The

Replication

Task

Suite

source

code

is

available

at:

http://scm.dspace.org/svn/repo/modules/dspace-replicate/

Image Added
In

addition,

there

is

an

associated

JIRA

Issue

at:

https://jira.duraspace.org/browse/DS-876

{info} {note:title=More Information}More information on the Replication Task Suite & some early screenshots/diagrams were presented as part of the DuraCloud Workshop at Open Repositories 2011:

Image Added

Note
titleMore Information

More information on the Replication Task Suite & some early screenshots/diagrams were presented as part of the DuraCloud Workshop at Open Repositories 2011: http://www.slideshare.net/tdonohue/dspace-duracloud-integrations

\\ The Replication Task Suite will be released as an optional

Image Added
The Replication Task Suite will be released as an optional "add-on"

to

[DSpace 1.8|

DSpace

Release

1.8

.0 Notes]

(It

will

be

1.8-compatible,

and

you

can

choose

to

install

it

on

an

existing

DSpace

1.8.0

system)

{note} {toc:outline=true|style=none|minLevel=2} h2. Prerequisites h3. Must be installed on a DSpace

Table of Contents
minLevel2
outlinetrue
stylenone

Prerequisites

Must be installed on a DSpace 1.8.x

...

System

{:=
Warning
title
Known
Curation
System
bug
in
1.8.0
}

DSpace

1.8.0

contains

a

bug

in

the

Curation

System

which

causes

a

NullPointerException

error

to

be

returned

when

any

curation

task

is

run

across

the

entire

site

(see

[

DS-1077

|https://jira.duraspace.org/browse/DS-1077]

).

This

bug

directly

affects

the

Replication

Task

Suite.

Even

when

a

replication

task

succeeds,

it

will

still

throw

a

NullPointerException.

You

can

check

the

DSpace

logs

to

tell

whether

the

task

actually

succeeded

or

not.

This

bug

will

be

resolved

in

DSpace

1.8.1.

*


Because

of

the

above

bug,

we

recommend

running

the

Replication

Suite

on

DSpace

1.8.1

or

above.

*{warning} Because of enhancements to the [DSDOC18:Curation System] in DSpace

Because of enhancements to the Curation System in DSpace 1.8.0,

...

the

...

Replication

...

Suite

...

is

...

only

...

compatible

...

with

...

a

...

DSpace

...

1.8.x

...

System.

...

User

...

Interface

...

Compatibility

...

Notes

...

As

...

the

...

Replication

...

Suite

...

is

...

just

...

a

...

suite

...

of

...

Curation

...

System

...

tasks,

...

it

...

may

...

be

...

called

...

(like

...

any

...

Curation

...

Tasks)

...

from

...

the

...

following

...

locations:

...

  • From

...

  • the

...

  • Command

...

  • Line

...

  • From

...

  • the

...

  • Admin

...

  • UI

...

  • (XMLUI

...

  • Only)

...

  • From

...

  • Approval

...

  • Workflow

...

  • From

...

  • custom

...

  • Java

...

  • code

...

For

...

more

...

information

...

see

...

the

...

Curation

...

System

...

details

...

on

...

Task

...

Invocation

...

.

...

Installation

Note
titleWORK IN PROGRESS

These instructions are still a work in progress.

Manual Installation

  1. Download the Replication Suite code

...

  1. Build/Compile

...

  1. the

...

  1. Replication

...

  1. Suite,

...

  1. by

...

  1. running

...

  1. the

...

  1. following

...

  1. from

...

  1. the

...

  1. root

...

  1. directory

...

  1. Code Block

...

  1. mvn package

...

  1. Copy the generated JAR files to your DSpace 1.8

...

  1. installation.

...

    1. Wiki Markup
      There are a total of 5 JARs that will need to be copied to your {{\[dspace\]/lib/}}

...

      • Wiki Markup
        {{\[dspace-replicate\]/target/dspace-replicate-\[version\].jar}}  (The Replication Suite Plugin)

...

      • Wiki Markup
        {{\[dspace-replicate\]/target/lib/common-\[version\].jar}} (DuraCloud common libraries - required for DuraCloud integration)

...

      • Wiki Markup
        {{\[dspace-replicate\]/target/lib/commons-compress-\[version\].jar}} (Apache Commons Compress - prerequisite for Replication Suite plugin)

...

      • Wiki Markup
        {{\[dspace-replicate\]/target/lib/storageprovider-\[version\].jar}} (DuraCloud storage provider libraries - required for DuraCloud integration)

...

      • Wiki Markup
        {{\[dspace-replicate\]/target/lib/storeclient-\[version\].jar}} (DuraCloud store client libraries - required for DuraCloud integration)

...

    1. Wiki Markup
      Also, copy the above 5 JARs also to your XMLUI web application's WEB-INF/lib directory (e.g. {{\[dspace\]/webapps/xmlui/WEB-INF/lib/}})

...

  1. Copy

...

  1. the

...

  1. Replication

...

  1. Suite's

...

  1. configuration

...

  1. files

...

  1. to

...

  1. your

...

  1. DSpace

...

  1. configuration

...

  1. directory
    • Wiki Markup
      *Replication Suite Configuration File:* Copy {{\[dspace-replicate\]/config/modules/replicate.cfg}} to your {{\[dspace\]/config/modules/}} directory

...

    • Wiki Markup
      *METS-specific AIP Configuration Settings:* Copy {{\[dspace-replicate\]/config/modules/replicate-mets.cfg}} to your {{\[dspace\]/config/modules/}} directory

...

    • Wiki Markup
      *DuraCloud Configuration File:* Copy {{\[dspace-replicate\]/config/modules/duracloud.cfg}} to your {{\[dspace\]/config/modules/}} directory

...

  1. Finally,

...

  1. follow

...

  1. the

...

  1. Configuration

...

  1. settings

...

  1. instructions

...

  1. below

...

  1. to

...

  1. configure

...

  1. the

...

  1. Replication

...

  1. Suite

...

  1. based

...

  1. on

...

  1. your

...

  1. usage

...

  1. needs.

...

    • Wiki Markup
      There is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which can be used as a reference. It is pre-configured to use the [DSpace AIP Format|DSDOC18:DSpace AIP Format] (METS-based packaging).

...

Maven-based

...

Installation

...

(Coming

...

Soon)

...

Coming Soon.

Configuration

Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.

Before getting started, you may wish to determine the answers to the following questions:

  1. Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt packaged AIPs?
  2. Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
  3. Do you plan to use Checkm manifests for quick auditing?

Configuring usage of default DSpace AIPs (METS-based)

One of the first questions to ask yourself is the format you wish to utilize for your AIPs. There are two options: default DSpace AIPs (METS-based) or BagIt packaged AIPs.
By default the DSpace Replication Suite is configured to backup & restore using the default DSpace AIP Format (which uses METS packaging).

This section goes through the steps of configuring the Replication Suite to use METS-based AIPs.

  1. Wiki Markup
    First, in your {{\[dspace\]/config/modules/curate.cfg}} you will want to enable & configure the METS-based replication tasks. (NOTE: there is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which is pre-configured to use METS-based AIPs).

...

    • In

...

    • the

...

    • list

...

    • of

...

    • "Task

...

    • Class

...

    • implementations"

...

    • (

...

    • plugin.named.org.dspace.curate.CurationTask

...

    • ),

...

    • add

...

    • the

...

    • following.

...

    • REMEMBER

...

    • to

...

    • add

...

    • a

...

    • backslash

...

    • (

...


    • )

...

    • after

...

    • each

...

    • line

...

    • (except

...

    • the

...

    • final

...

    • line)!

...

    • Code Block
      
      plugin.named.org.dspace.curate.CurationTask = \
           ... (your existing tasks) , \
          org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
          org.dspace.ctask.replicate.ReadOdometer = readodometer, \
          org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
          org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
          org.dspace.ctask.replicate.FetchAIP = fetchaip, \
          org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
          org.dspace.ctask.replicate.RemoveAIP = removeaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
      

...

Configuring usage of BagIt packaged AIPs

Replica Storage Settings

Where your AIPs will be stored

Configuring usage of Checkm manifest validation

Problem Statement & Usage Examples

We can suppose our data curator has identified a collection of items in her DSpace repository consisting of high-value, born-digital, and unique/irreplaceable (not held elsewhere) content. She prudently wishes to insure against catastrophic local loss of this content by keeping a copy or replica of this collection elsewhere. She'd prefer to replicate all her DSpace content, but realizes that storage costs over long periods has made her administration wary, so decides to begin with this collection.

First Steps - Estimation

In order to budget for replication storage, she needs to know the 'size' of the collection. When she asks her sysadmin, he replies that it is easy to give her figures for the whole asset store, but since collections aren't stored separately, she would have to add up each item's bitstreams in the collection, a rather tedious process. Thus the first task: a reporting tool which operates on natural DSpace objects, rather than storage volumes.

To install this task, edit /dspace/config/modules/curate.cfg

...

(NB:

...

all

...

curation

...

configuration

...

is

...

'modular'

...

in

...

the

...

sense

...

that

...

the

...

configuration

...

properties

...

live

...

outside

...

of

...

dspace.cfg,

...

in

...

named

...

files.

...

This

...

means

...

that

...

if

...

a

...

given

...

suite

...

of

...

tasks

...

is

...

unused,

...

it's

...

configuration

...

is

...

never

...

installed).

...

First,

...

add

...

the

...

task

...

to

...

the

...

lists

...

of

...

curation

...

tasks.

{
Code Block
}
plugin.named.org.dspace.curate.CurationTask = \
.... other curation tasks
    org.dspace.ctask.replicate.EstimateAIPSize = estaipsize
{code}

Next,

...

in

...

the

...

same

...

file,

...

add

...

this

...

task

...

to

...

the

...

list

...

that

...

appears

...

in

...

the

...

administrative

...

UI:

{
Code Block
}
ui.tasknames = \
.... other tasks
    estaipsize = Estimate Storage for AIPs
{code}

Of

...

course,

...

both

...

the

...

name

...

of

...

the

...

task

...

('estaipsize'),

...

and

...

the

...

language

...

for

...

the

...

UI

...

are

...

up

...

to

...

you.

...

Now

...

the

...

curator

...

can

...

navigate

...

to

...

her

...

collection,

...

select

...

the

...

'curate'

...

tab,

...

and

...

then

...

from

...

the

...

dropdown

...

list

...

of

...

tasks

...

choose

...

the

...

entry,

...

and

...

perform

...

the

...

task.

...

On

...

the

...

page,

...

the

...

results

...

will

...

display:

...

ID:

...

123456789/1

...

(Amazing

...

Images)

...

estimated

...

AIP

...

size:

...

4

...

gigabytes

...

The

...

estimates

...

from

...

this

...

task

...

are

...

rather

...

crude,

...

in

...

that

...

they

...

do

...

not

...

measure

...

the

...

actual

...

AIPs,

...

but

...

just

...

the

...

bitstreams

...

(so

...

ignore

...

the

...

metadata

...

xml),

...

but

...

should

...

be

...

fine

...

for

...

storage

...

costing

...

and

...

allocating

...

purposes.

...

Replicating

Having secured approval to replicate 'Amazing Images' collection, our curator obviously needs a task to generate the AIP representations of each item in the collection, and transmit these archive files to the replication storage site (which may be service-backed, local, in the cloud, etc, as will be explored below). Adding this task is just like the previous step: editing into curate.cfg the configuration properties. (We won't repeat a description of this process each time, but note that you may always add a task, but elect not to display it in the administrative UI.). This task is 'org.dspace.ctask.replicate.TransmitAIP'.

...

Since

...

we

...

are

...

now

...

working

...

with

...

AIPs,

...

we

...

should

...

examine

...

how

...

they

...

are

...

configured

...

to

...

the

...

tasks.

...

Most

...

configuration

...

specific

...

to

...

the

...

replication

...

task

...

suite

...

is

...

found

...

at

...

/dspace/config/modules/replicate.cfg.

...

There

...

are

...

two

...

main

...

properties

...

to

...

set

...

(or

...

accept

...

default

...

values):

{
Code Block
}
# Package type. Permitted values: 'mets', 'bagit'
packer.pkgtype = mets
# Format of package compression. Permitted values: 'zip' or 'tgz'
# for 'mets' packages, only zip is supported
packer.archfmt = zip
{code}

The

...

default

...

values

...

will

...

create

...

a

...

METS-based

...

AIP,

...

compressed

...

into

...

a

...

'zip'

...

archive.

...

The

...

other

...

alternative

...

supported

...

by

...

the

...

replication

...

task

...

suite

...

is

...

Library

...

of

...

Congress

...

'Bagit'

...

packaging,

...

which

...

may

...

compressed

...

either

...

into

...

a

...

'zip'

...

file

...

or

...

a

...

'tgz'

...

('gzipped

...

tar'),

...

a

...

compression

...

standard

...

more

...

common

...

in

...

Unix

...

systems.

...

Our

...

data

...

curator

...

may

...

elect

...

to

...

perform

...

this

...

task

...

in

...

the

...

admin

...

GUI,

...

or,

...

if

...

the

...

collection

...

is

...

rather

...

large,

...

she

...

may

...

instead

...

'queue'

...

the

...

task

...

for

...

later

...

execution

...

by

...

using

...

the

...

queueing

...

facility

...

available

...

in

...

the

...

curation

...

system.

...

We

...

should

...

note

...

that

...

the

...

'transmitAIP'

...

task,

...

like

...

all

...

other

...

replication

...

tasks,

...

operate

...

on

...

whatever

...

DSpace

...

object

...

they

...

are

...

given.

...

Thus,

...

if

...

the

...

object

...

is

...

a

...

collection,

...

the

...

task

...

creates

...

(and

...

transmits,

...

of

...

course)

...

an

...

AIP

...

for

...

the

...

collection

...

object

...

itself

...

(metadata

...

and

...

logo),

...

as

...

well

...

as

...

AIPs

...

for

...

each

...

item

...

in

...

the

...

collection.

...

If

...

the

...

task

...

is

...

given

...

an

...

identifier

...

for

...

a

...

single

...

Item,

...

then

...

only

...

one

...

AIP

...

will

...

be

...

created.

...

Verifying

...

Replication

...

While

...

the

...

transmitAIP

...

task

...

will

...

report

...

on

...

whether

...

or

...

not

...

it

...

was

...

successful

...

in

...

generating

...

and

...

transmitting

...

AIP(s)

...

to

...

the

...

replication

...

service,

...

our

...

data

...

curator

...

wants

...

the

...

ability

...

(within

...

DSpace,

...

not

...

by

...

using

...

the

...

replication

...

service

...

tools

...

or

...

UIs)

...

to

...

check

...

whenever

...

she

...

likes

...

that

...

the

...

AIP(s)

...

which

...

were

...

transmitted

...

are

...

still

...

there.

...

A

...

simple

...

task

...

'org.dspace.ctask.VerifyAIP'

...

can

...

perform

...

this

...

function.

...

Ensuring

...

Replica

...

Integrity

...

and

...

Accuracy

...

over

...

time

...

The

...

'Amazing

...

Images'

...

collection

...

is

...

comparatively

...

static,

...

meaning

...

that

...

few

...

new

...

items

...

are

...

likely

...

to

...

be

...

added,

...

and

...

most

...

of

...

the

...

metadata

...

in

...

each

...

item

...

is

...

not

...

routinely

...

changed.

...

However,

...

over

...

longer

...

periods

...

of

...

time,

...

cataloging

...

errors

...

are

...

discovered

...

and

...

corrected,

...

perhaps

...

formats

...

become

...

obsolete

...

and

...

new

...

bitstreams

...

are

...

added.

...

If

...

the

...

curator

...

is

...

fastidious

...

about

...

each

...

change,

...

and

...

performs

...

the

...

'transmitAIP'

...

task

...

on

...

each

...

item

...

that

...

has

...

changed,

...

then

...

in

...

general

...

the

...

set

...

of

...

AIP

...

replicas

...

will

...

always

...

be

...

'in

...

sync'

...

with

...

the

...

repository.

...

However,

...

it

...

useful

...

to

...

have

...

the

...

means

...

to

...

ensure

...

that

...

the

...

replicas

...

agree

...

with

...

the

...

repository

...

without

...

having

...

to

...

create

...

and

...

transmit

...

entirely

...

new

...

ones.

...

Thus

...

the

...

task:

...

'org.dspace.ctask.replicate.CompareWithAIP',

...

which

...

can

...

also

...

be

...

thought

...

of

...

as

...

a

...

simple

...

audit

...

task.

...

When

...

performed

...

on

...

an

...

Item,

...

the

...

task

...

does

...

the

...

following:

...

  • generates

...

  • an

...

  • AIP

...

  • for

...

  • the

...

  • Item

...

  • (but

...

  • does

...

  • not

...

  • transmit

...

  • it)

...

  • computes

...

  • a

...

  • checksum

...

  • on

...

  • the

...

  • local

...

  • AIP

...

  • requests

...

  • from

...

  • the

...

  • replication

...

  • storage

...

  • service

...

  • a

...

  • checksum

...

  • for

...

  • the

...

  • replica

...

  • AIP

...

  • compares

...

  • the

...

  • 2

...

  • checksums

...

The

...

task

...

will

...

thus

...

fail

...

only

...

if

...

the

...

checksums

...

differ,

...

which

...

can

...

only

...

happen

...

if

...

some

...

part

...

of

...

the

...

Item

...

(metadata

...

or

...

bitstream)

...

itself

...

differs.

...

If

...

the

...

version

...

of

...

the

...

item

...

that

...

is

...

believed

...

to

...

be

...

authentic

...

is

...

the

...

repository

...

(local)

...

one,

...

then

...

a

...

simple

...

performance

...

of

...

'transmitAIP'

...

task

...

on

...

the

...

item

...

will

...

restore

...

synchrony.

...

For

...

collections

...

and

...

communities,

...

this

...

task

...

also

...

does

...

an

...

'extent'

...

comparison,

...

which

...

means

...

that

...

it

...

will

...

determine

...

whether

...

the

...

replica

...

store

...

has

...

an

...

AIP

...

for

...

every

...

item

...

known

...

(locally)

...

to

...

be

...

in

...

the

...

collection

...

or

...

community.

...

Repairing

...

Damage

...

The

...

AIPs

...

in

...

the

...

replica

...

store

...

represent

...

an

...

insurance

...

policy,

...

and

...

when

...

'claims'

...

against

...

that

...

policy

...

are

...

filed,

...

they

...

can

...

cover

...

2

...

situations:

...

either

...

the

...

repository

...

object

...

is

...

completely

...

missing,

...

and

...

we

...

want

...

to

...

restore

...

it,

...

or

...

it

...

is

...

damaged

...

and

...

we

...

want

...

to

...

repair

...

the

...

damage

...

with

...

data

...

from

...

the

...

replica

...

store

...

AIP.

...

A

...

pair

...

of

...

replication

...

tasks

...

perform

...

these

...

functions:

...

'org.dspace.ctask.replicate.RecoverFromAIP'

...

will

...

do

...

the

...

following:

...

  • fetch

...

  • the

...

  • replica

...

  • store

...

  • AIP

...

  • for

...

  • the

...

  • identifier

...

  • given

...

  • the

...

  • task

...

  • decompress

...

  • it

...

  • and

...

  • create

...

  • a

...

  • new

...

  • DSpace

...

  • object

...

  • install

...

  • the

...

  • object

...

  • into

...

  • the

...

  • repository,

...

  • including

...

  • restoring

...

  • it's

...

  • state

...

  • (withdrawn,

...

  • embargoed,

...

  • etc)

...

This

...

task

...

will

...

fail

...

if

...

there

...

is

...

already

...

an

...

object

...

in

...

the

...

repository

...

bearing

...

the

...

identifier

...

given.

...

By

...

contrast,

...

the

...

task

...

'org.dspace.ctask.replicate.ReplaceWithAIP'

...

(the

...

'repair'

...

task),

...

expects

...

an

...

existing

...

repository

...

object,

...

and

...

will

...

fail

...

if

...

it

...

does

...

not

...

find

...

one.

...

This

...

task

...

simply

...

'overlays'

...

the

...

metadata

...

and

...

bitstreams

...

of

...

the

...

AIP

...

version

...

onto

...

the

...

existing

...

record.

...

Cleanup

Ordinarily,

...

a

...

replication

...

arrangement

...

is

...

long

...

standing:

...

the

...

preservation

...

function

...

cannot

...

be

...

fulfilled

...

unless

...

the

...

replicas

...

(here,

...

the

...

AIPs)

...

are

...

always

...

kept

...

and

...

available.

...

However,

...

some

...

collections

...

(or

...

items

...

within

...

them)

...

may

...

be

...

removed

...

for

...

a

...

variety

...

of

...

reasons:

...

legal

...

challenge,

...

de-accession,

...

etc.

...

When

...

the

...

repository

...

no

...

longer

...

locally

...

wants

...

to

...

hold

...

the

...

object,

...

the

...

replica

...

AIP

...

ceases

...

to

...

have

...

value.

...

The

...

task

...

'org.dspace.ctask.replicate.RemoveAIP'

...

will

...

delete

...

the

...

replica

...

store

...

AIP

...

for

...

its

...

identifier.

...

As

...

will

...

other

...

replication

...

tasks,

...

if

...

the

...

identifier

...

points

...

to

...

collection

...

or

...

community,

...

all

...

the

...

AIPs

...

of

...

all

...

the

...

members

...

will

...

also

...

be

...

deleted.

...

Keeping

...

Score

...

Many

...

storage

...

providers

...

have

...

cost

...

structures

...

that

...

are

...

more

...

complex

...

than

...

simple

...

functions

...

of

...

the

...

total

...

stored

...

bytes:

...

particularly

...

cloud

...

providers

...

have

...

costs

...

associated

...

wth

...

the

...

use

...

of

...

the

...

network

...

to

...

upload

...

and

...

download

...

the

...

stored

...

object.

...

An

...

object

...

that

...

occupies

...

2

...

megaBytes

...

might

...

cost

...

far

...

more

...

over

...

time

...

than

...

a

...

1

...

gigaByte

...

object,

...

if

...

the

...

former

...

is

...

downloaded

...

1000

...

times

...

for

...

every

...

time

...

the

...

latter

...

is.

...

The

...

replication

...

system

...

provides

...

a

...

very

...

rudimentary

...

task

...

to

...

help

...

manage

...

and

...

track

...

these

...

factors:

...

'org.dspace.ctask.replicate.ReadOdometer'.

...

This

...

task

...

simply

...

displays

...

the

...

readings

...

from

...

the

...

replication

...

system

...

that

...

record

...

cumulative

...

use.

...

The

...

statistics

...

are:

...

  • total

...

  • number

...

  • of

...

  • objects

...

  • (AIPS,

...

  • typically)

...

  • in

...

  • the

...

  • replica

...

  • store

...

  • total

...

  • size

...

  • of

...

  • all

...

  • objects

...

  • total

...

  • number

...

  • of

...

  • bytes

...

  • downloaded

...

  • from

...

  • the

...

  • store

...

  • total

...

  • number

...

  • of

...

  • bytes

...

  • uploaded

...

  • to

...

  • the

...

  • store

...

These

...

figures

...

can

...

be

...

used

...

as

...

a

...

means

...

of

...

checking

...

and

...

validating

...

service

...

charges

...

from

...

storage

...

providers.

Automation

While the coordinated use of the tasks described above can provide the basis for a solid replication strategy and practice, there are several processes that could necessitate a fair amount of curatorial work. For example, in the discussion on ensuring integrity of AIPs over time, we remarked that vigilance was required by the curator to transmit new AIPs whenever Items change. It is possible to leverage existing facilities in DSpace to substantially reduce this effort through automation.

The replication code includes a so-called 'event consumer', that can 'listen for' any changes to objects in the repository. Event consumers are documented elsewhere, but all we need to do to activate this consumer is add it to the list of consumers (in dspace.cfg):

Code Block


h2. Automation

While the coordinated use of the tasks described above can provide the basis for a solid replication strategy and practice, there are several processes that could necessitate a fair amount of curatorial work. For example, in the discussion on ensuring integrity of AIPs over time, we remarked that vigilance was required by the curator to transmit new AIPs whenever Items change. It is possible to leverage existing facilities in DSpace to substantially reduce this effort through automation.

The replication code includes a so-called 'event consumer', that can 'listen for' any changes to objects in the repository. Event consumers are documented elsewhere, but all we need to do to activate this consumer is add it to the list of consumers (in dspace.cfg):

{code}
#### Event System Configuration ####

# default synchronous dispatcher (same behavior as traditional DSpace)
event.dispatcher.default.class = org.dspace.event.BasicDispatcher
event.dispatcher.default.consumers = search, browse, eperson, harvester, replicate
....
# consumer to manage content replication
event.consumer.replicate.class = org.dspace.ctask.replicate.ReplicateConsumer
event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete
{code}

This

...

configuration

...

essentially

...

means:

...

listen

...

for

...

any

...

new,

...

modified

...

or

...

deleted

...

Items,

...

Collections

...

and

...

Communities.

...

If

...

you

...

do

...

not

...

care

...

about

...

Community

...

or

...

Collection

...

AIPs,

...

just

...

remove

...

'Community'

...

or

...

'Collection'

...

from

...

the

...

list.

...

When

...

the

...

ReplicateConsumer

...

gets

...

a

...

relevant

...

event,

...

it

...

will

...

act

...

on

...

it

...

as

...

follows:

...

If

...

the

...

event

...

is

...

an

...

addition

...

of

...

a

...

new

...

DSpace

...

object

...

(actually

...

for

...

Items,

...

an

...

'installation'

...

-

...

i.e.

...

when

...

the

...

item

...

exits

...

workflow),

...

then

...

a

...

request

...

for

...

an

...

AIP

...

transmission

...

is

...

queued.

...

The

...

same

...

occurs

...

whenever

...

an

...

object

...

has

...

changed

...

(so-called

...

modify

...

events).

...

When

...

an

...

object

...

is

...

deleted,

...

a

...

'catalog'

...

of

...

the

...

deletion

...

is

...

transmitted

...

to

...

the

...

replication

...

service.

...

The

...

catalog

...

just

...

lists

...

all

...

the

...

parts

...

of

...

the

...

deletion:

...

if

...

an

...

item,

...

then

...

just

...

the

...

handle

...

of

...

the

...

item,

...

if

...

a

...

collection,

...

then

...

all

...

the

...

item

...

handles

...

that

...

were

...

in

...

it.

...

This

...

way,

...

if

...

the

...

deletion

...

was

...

mistaken,

...

the

...

catalog

...

can

...

be

...

used

...

to

...

recover

...

all

...

the

...

contents.

...

This

...

represents

...

the

...

default

...

behavior

...

of

...

the

...

consumer.

...

You

...

may

...

configure

...

it

...

in

...

/dspace/modules/replicate.cfg:

{
Code Block
}
###  ReplicateConsumer settings ###
# ReplicateConsumer must be properly declared/configured in dspace.cfg
# All tasks defined will be queued, unless the '+p' suffix is appended, when
# they will be immediately performed. Exercise considerable caution when using
# +p, as lengthy tasks can adversely affect UI or other responsiveness.

# Replicate event consumer tasks upon install/add events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.add = transmitaip

# Replicate event consumer tasks upon modification events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.mod = transmitaip

# Replicate event consumer tasks upon a delete/remove events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.del = catalog+p

# Replicate event consumer queue name - where all queued tasks are placed
consumer.queue = replication
{code}

Using the event consumer, the curator can essentially operate replication in replication

Using the event consumer, the curator can essentially operate replication in 'auto-pilot'

...

after

...

the

...

first

...

complete

...

transmission

...

of

...

AIPs.

...


One

...

important

...

configuration

...

to

...

be

...

aware

...

of

...

is

...

this:

...

by

...

default,

...

the

...

consumer

...

will

...

process

...

all

...

events

...

it

...

receives

...

-

...

regardless

...

of

...

collection.

...

But

...

in

...

our

...

current

...

case,

...

we

...

intend

...

for

...

only

...

the

...

'Amazing

...

Images'

...

collection

...

to

...

be

...

replicated.

...

To

...

effect

...

this,

...

we

...

must

...

create

...

a

...

file

...

in

...

the

...

directory

...

defined

...

by

...

the

...

/dspace/config/modules/replicate.cfg

...

property:

{
Code Block
}
# Base directory for replication operations
base.dir = ${dspace.dir}/replicate
{code}

Create

...

a

...

simple

...

text

...

file

...

called

...

'include'

...

and

...

put

...

the

...

handle

...

of

...

the

...

collection

...

for

...

'Amazing

...

Images'

...

in

...

it.

...

You

...

can

...

add

...

as

...

many

...

collections

...


(one

...

per

...

line)

...

as

...

you

...

like.

...

If

...

you

...

replicate

...

all

...

but

...

a

...

few

...

collections,

...

just

...

name

...

the

...

file

...

'exclude'

...

and

...

list

...

the

...

collection

...

handles

...

you

...

want

...

to

...

exclude.

...

Replica

...

Storage

...

For

...

the

...

replication

...

of

...

AIPs

...

to

...

be

...

of

...

any

...

significant

...

value,

...

they

...

must

...

be

...

stored

...

in

...

a

...

safe,

...

persistent,

...

reliable,

...

accessible,

...

and

...

available

...

location.

...

The

...

replication

...

tasks

...

of

...

transmitting,

...

fetching,

...

etc

...

all

...

rely

...

on

...

the

...

storage

...

provider

...

configured.

...

This

...

and

...

related

...

properties

...

are

...

found

...

in

...

replicate.cfg:

{
Code Block
}
# Replica store implementation class
plugin.single.org.dspace.ctask.replicate.ObjectStore = \
    org.dspace.ctask.replicate.store.LocalObjectStore

# Location of local (e.g. local, mountable, sync) object store
# ignored for non-local stores (e.g. DuraCloud)
store.dir = ${dspace.dir}/repstore
{code}

The

...

default

...

configuration

...

simply

...

writes

...

the

...

AIPs

...

to

...

the

...

local

...

directory

...

configured

...

by

...

the

...

'store.dir'

...

property

...

above.

...

This

...

is

...

not

...

intended

...

to

...

be

...

a

...

production-grade

...

solution,

...

since

...

a

...

failure

...

in

...

the

...

DSpace

...

asset

...

store

...

could

...

likely

...

also

...

affect

...

this

...

storage.

...

It

...

is

...

provided

...

mostly

...

as

...

a

...

way

...

to

...

begin

...

to

...

work

...

with

...

the

...

replication

...

tasks

...

without

...

worrying

...

about

...

finding

...

a

...

storage

...

provider.

...

For

...

replicating

...

in

...

earnest,

...

a

...

service

...

like

...

DuraCloud

...

is

...

recommended,

...

and

...

what

...

follows

...

are

...

instructions

...

on

...

how

...

to

...

configure

...

a

...

DuraCloud

...

storage

...

provider.

...

Note

...

that

...

this

...

service

...

must

...

be

...

established

...

and

...

provisioned

...

prior

...

to

...

use,

...

and

...

those

...

details

...

may

...

be

...

obtained

...

from

...

DuraSpace:

...

http://duraspace.org/duracloud.php

...

Configuration