Page History
Table of Contents | ||||
---|---|---|---|---|
|
Note |
---|
This code is now available on the current DSpace SVN Trunk |
Wiki Markup |
{toc:outline=true|style=none}
{note}This code is now available on the current DSpace SVN Trunk (http://scm.dspace.org/svn/repo/dspace/trunk/). It will be officially released as part of DSpace 1.7.0. {note} {warning}*For Developers:* This code changes the current {{ |
Warning |
---|
For Developers: This code changes the current and {{
interfaces. If you've written any local, custom Packagers at your institution, they will need to be refactored to utilize these updated interfaces. {warning} h1. AIP Backup & Restore for DSpace 1.7 h2. Background & Overview {note}Additional background information available in the OR10 Presentation entitled [Improving DSpace Backups, Restores & Migrations|http://www.slideshare.net/tdonohue/improving-dspace-backups-restores-migrations]{note} This work comes out of a requirement for DSpace integration with DuraCloud ([ |
AIP Backup & Restore for DSpace 1.7
Background & Overview
Note |
---|
Additional background information available in the OR10 Presentation entitled Improving DSpace Backups, Restores & Migrations |
This work comes out of a requirement for DSpace integration with DuraCloud (http://www.duracloud.org
...
).
...
One
...
of
...
these
...
requirements
...
is
...
to
...
be
...
able
...
to
...
essentially
...
"backup"
...
local
...
DSpace
...
contents
...
into
...
the
...
cloud
...
(as
...
a
...
type
...
of
...
offsite
...
backup),
...
and
...
"restore"
...
those
...
contents
...
at
...
a
...
later
...
time.
...
Essentially,
...
we
...
need
...
a
...
way
...
to
...
be
...
able
...
to
...
export
...
the
...
entire
...
hierarchy
...
(i.e.
...
bitstreams,
...
metadata
...
and
...
relationships
...
between
...
Communities/Collections/Items)
...
into
...
a
...
relatively
...
standard
...
format
...
(e.g.
...
METS
...
or
...
similar
...
structured
...
packaging
...
format).
...
This
...
entire
...
hierarchy
...
should
...
also
...
be
...
able
...
to
...
be
...
re-imported
...
into
...
DSpace
...
in
...
the
...
same
...
format,
...
to
...
allow
...
for
...
"round-tripping"
...
of
...
that
...
content
...
(essentially
...
a
...
restore
...
of
...
that
...
content
...
in
...
the
...
same
...
or
...
different
...
DSpace
...
installation).
...
Perceived
...
benefits
...
to
...
DSpace
...
community:
...
- Would
...
- allow
...
- folks
...
- to
...
- more
...
- easily
...
- move
...
- entire
...
- Communities
...
- or
...
- Collections
...
- between
...
- DSpace
...
- instances.
...
- Would
...
- allow
...
- for
...
- a
...
- potentially
...
- more
...
- consistent
...
- backup
...
- of
...
- this
...
- hierarchy
...
- (e.g.
...
- to
...
- DuraCloud,
...
- or
...
- just
...
- to
...
- your
...
- own
...
- local
...
- backup
...
- system),
...
- rather
...
- than
...
- relying
...
- on
...
- synchronizing
...
- a
...
- backup
...
- of
...
- your
...
- DB
...
- (metadata/relationships)
...
- and
...
- assetstore
...
- (bitstreams).
...
- Would
...
- provide
...
- a
...
- way
...
- for
...
- people
...
- to
...
- more
...
- easily
...
- get
...
- their
...
- data
...
- out
...
- of
...
- DSpace
...
- (whatever
...
- the
...
- purpose
...
- may
...
- be).
...
- Would
...
- provide
...
- a
...
- relatively
...
- standard
...
- format
...
- for
...
- people
...
- to
...
- migrate
...
- entire
...
- hierarchies
...
- (Communities/Collections)
...
- into
...
- DSpace
...
- (from
...
- another
...
- system).
...
This
...
is
...
related
...
to
...
(and
...
a
...
partial
...
subset
...
of)
...
MIT's
...
...
.
...
However,
...
the
...
original
...
AIP
...
prototype
...
did
...
not
...
make
...
it
...
very
...
easy
...
to
...
re-import
...
the
...
exported
...
AIPs
...
for
...
Communities
...
or
...
Collections.
...
So,
...
this
...
prototype
...
extends
...
on
...
the
...
old
...
AIP
...
prototype's
...
packagers/crosswalks
...
to
...
allow
...
for
...
an
...
full
...
export
...
and
...
import
...
of
...
an
...
entire
...
DSpace
...
hierarchy,
...
or
...
just
...
a
...
set
...
of
...
Communities,
...
Collections
...
or
...
Items.
How does this work help DSpace interact with DuraCloud?
This work is entirely about exporting DSpace content objects to a location on a local filesystem. So, this work doesn't interact solely with DuraCloud, and could be used by any backup storage system to backup your DSpace contents.
In the initial DuraCloud work, the DuraCloud team is working on a way to "synchronize" DuraCloud with a local file folder. So, DuraCloud can be configured to "watch" a given folder and automatically replicate its contents into the cloud.
Therefore, moving content from DSpace to DuraCloud would currently be a two-step process:
- First, export AIPs describing that content from DSpace to a filesystem folder
- Second, enable DuraCloud to watch that same filesystem folder and replicate it into the cloud.
Similarly, moving content from DuraCloud back into DSpace would also be a two-step process:
- First, you'd tell DuraCloud to replicate the AIPs from the cloud to a folder on your file system
- Second, you'd ingest those AIPs back into DSpace
(These backup/restore processes may change as we go forward and investigate more use cases. This is just the initial plan.)
Makeup and Definition of AIPs
AIPs are Archival Information Packages.
- AIP is a package describing one archival object.
- Archival object may be Item, Collection, or Community. Bitstreams are included in an Item's AIP.
- Each AIP is logically self-contained, can be restored without rest of the archive. (So you could restore a single Item, Collection or Community)
- AIP profile favors completeness and accuracy rather than presenting the semantics of an object in a standard format. It conforms to the quirks of DSpace's internal object model rather than attempting to produce a universally understandable representation of the object.
- An AIP can serve as a DIP (Dissemination Information Package) or SIP (Submission Information Package), especially when transferring custody of objects to another DSpace implementation.
- In contrast to SIP or DIP, the AIP should include all available DSpace structural and administrative metadata, and basic provenance information.
- Restoration of an archive from AIPs is not perfectly complete at this time; it is intended to recover from catastrophic loss of content and metadata, not restore the exact same archive as before. Currently, some information (e.g. access controls, people, groups) would be lost, as they are not stored in the AIPs.
AIP Structure / Format
Generally speaking, an AIP is an Zip file containing a METS manifest and all related content bitstreams.
For more specific details of AIP format / structure, along with examples, please see DSpaceAIPFormat
Where to get the Code
The latest code is available on DSpace Trunk (and will be released in DSpace 1.7.0)
Code Block |
---|
h3. How does this work help DSpace interact with DuraCloud? This work is entirely about *exporting* DSpace content objects to a location on a local filesystem. So, this work doesn't interact solely with DuraCloud, and could be used by any backup storage system to backup your DSpace contents. In the initial DuraCloud work, the DuraCloud team is working on a way to "synchronize" DuraCloud with a local file folder. So, DuraCloud can be configured to "watch" a given folder and automatically replicate its contents into the cloud. Therefore, moving content from DSpace to DuraCloud would currently be a two-step process: # First, export AIPs describing that content from DSpace to a filesystem folder # Second, enable DuraCloud to watch that same filesystem folder and replicate it into the cloud. Similarly, moving content from DuraCloud back into DSpace would also be a two-step process: # First, you'd tell DuraCloud to replicate the AIPs from the cloud to a folder on your file system # Second, you'd ingest those AIPs back into DSpace (These backup/restore processes may change as we go forward and investigate more use cases. This is just the initial plan.) h2. Makeup and Definition of AIPs h3. AIPs are Archival Information Packages. * AIP is a package describing one archival object. ** Archival object may be *Item*, *Collection*, or *Community*. Bitstreams are included in an Item's AIP. ** Each AIP is logically self-contained, can be restored without rest of the archive. (So you could restore a single Item, Collection or Community) ** AIP profile favors completeness and accuracy rather than presenting the semantics of an object in a standard format. It conforms to the quirks of DSpace's internal object model rather than attempting to produce a universally understandable representation of the object. ** An AIP _can_ serve as a DIP (Dissemination Information Package) or SIP (Submission Information Package), especially when transferring custody of objects to another DSpace implementation. * In contrast to SIP or DIP, the AIP should include all available DSpace structural and administrative metadata, and basic provenance information. * Restoration of an archive from AIPs is not perfectly complete at this time; it is intended to recover from catastrophic loss of content and metadata, _not_ restore the exact same archive as before. Currently, some information (e.g. access controls, people, groups) would be lost, as they are not stored in the AIPs. h3. AIP Structure / Format Generally speaking, an AIP is an Zip file containing a METS manifest and all related content bitstreams. For more specific details of AIP format / structure, along with examples, please see [DSpaceAIPFormat] h2. Where to get the Code The latest code is available on DSpace Trunk (and will be released in DSpace 1.7.0) {code} svn co http://scm.dspace.org/svn/repo/dspace/trunk/ {code} h3. What code has really changed? The majority of the code changes are in two main areas: # [ |
What code has really changed?
The majority of the code changes are in two main areas:
...
...
- -
...
- Packager
...
- classes
PackageIngester
interface - Now ingests 'java.io.File'
...
- objects
...
- instead
...
- of
...
- InputStreams
...
- (to
...
- better
...
- support
...
- recursive
...
- imports
...
- of
...
- Communities/Collections)
...
PackageDisseminator
interface - Now exports 'java.io.File'
...
- objects
...
- instead
...
- of
...
- OutputStreams
...
- (to
...
- better
...
- support
...
- recursive
...
- exports
...
- of
...
- Communities/Collections)
...
DSpaceAIPDisseminator
- Disseminates/Exports
...
- AIP(s)
...
DSpaceAIPIngester
- Ingests exported AIP(s)\
...
- Changes
...
- were
...
- also
...
- made
...
- to
...
- refactor
...
- /
...
- enhance
...
- the
...
AbstractMETSDisseminator
...
- ,
...
AbstractMETSIngester
...
- ,
...
- and
...
METSManifest
...
- classes
- org.dspace.content.crosswalk.*
AIPDIMCrosswalk
- Crosswalks DIM metadata for AIPsAIPTechMDCrosswalk
- Crosswalks METS TechMD sections for AIPs- There were also changes to the
MODSDisseminationCrosswalk
andXSLTDisseminationCrosswalk
to support creating "Site" AIPs
Note |
---|
For a full list of code changes (including patches) see: AipCoreAPIChanges |
Warning |
---|
For Developers: Because of the changes to the |
Running the Code
Exporting AIPs
Export Modes & Options
All AIP Exports are done by using the Dissemination Mode (-d
option) of the packager
command.
There are two types of AIP Dissemination you can perform:
- Single AIP (default, using
-d
option) - Exports just an AIP describing a single DSpace object. So, if you ran it in this default mode for a Collection, you'd just end up with a single Collection AIP (which would not include AIPs for all its child Items) - Hierarchy of AIPs (using the
-d --all
or-d -a
option) - Exports the requested AIP describing an object, plus the AIP for all child objects. Some examples follow:- For a Site - this would export all Communities, Collections & Items within the site into AIP files (in a provided directory)
- For a Community - this would export that Community and all SubCommunities, Collections and Items into AIP files (in a provided directory)
- For a Collection - this would export that Collection and all contained Items into AIP files (in a provided directory)
- For an Item – this just exports the Item into an AIP as normal (as it already contains its Bitstreams/Bundles by default)
Exporting just a single AIP
To export in single AIP mode (default), use this 'packager' command template:
Code Block |
---|
\*|http://fisheye3.atlassian.com/browse/dspace/sandbox/aip-external-1_6-prototype/dspace-api/src/main/java/org/dspace/content/crosswalk] #* {{AIPDIMCrosswalk}} \- Crosswalks DIM metadata for AIPs #* {{AIPTechMDCrosswalk}} \- Crosswalks METS TechMD sections for AIPs #* There were also changes to the {{MODSDisseminationCrosswalk}} and {{XSLTDisseminationCrosswalk}} to support creating "Site" AIPs {note}For a full list of code changes (including patches) see: [AipCoreAPIChanges]{note} {warning}*For Developers:* Because of the changes to the {{PackageIngester}} and {{PackageDisseminator}} interfaces, if you've created any local Packagers at your institution, those will need to be refactored.{warning} h2. Running the Code h3. Exporting AIPs h4. Export Modes & Options All AIP Exports are done by using the Dissemination Mode ({{\-d}} option) of the {{packager}} command. There are two types of AIP Dissemination you can perform: * *Single AIP* (default, using {{\-d}} option) - Exports just an AIP describing a single DSpace object. So, if you ran it in this default mode for a Collection, you'd just end up with a single Collection AIP (which would not include AIPs for all its child Items) * *Hierarchy of AIPs* (using the {{\-d \-\-all}} or {{\-d \-a}} option) - Exports the requested AIP describing an object, plus the AIP for all child objects. Some examples follow: ** For a Site - this would export *all* Communities, Collections & Items within the site into AIP files (in a provided directory) ** For a Community - this would export that Community and all SubCommunities, Collections and Items into AIP files (in a provided directory) ** For a Collection - this would export that Collection and all contained Items into AIP files (in a provided directory) ** For an Item -- this just exports the Item into an AIP as normal (as it already contains its Bitstreams/Bundles by default) h4. Exporting just a single AIP To export in single AIP mode (default), use this 'packager' command template: {code} /dspace/bin/dspace packager -d -t AIP -e <eperson> -i <handle> <file-path> {code} |
for
...
example:
Code Block |
---|
} /dspace/bin/dspace packager -d -t AIP -e admin@myu.edu -i 4321/4567 aip4567.zip {code} |
The
...
above
...
code
...
will
...
export
...
the
...
object
...
of
...
the
...
given
...
handle
...
(4321/4567)
...
into
...
an
...
AIP
...
file
...
named
...
"aip4567.zip".
...
This
...
will
...
not
...
include
...
any
...
child
...
objects
...
for
...
Communities
...
or
...
Collections.
Exporting AIP Hierarchy
To export an AIP hierarchy, use the -a
(or --all
) package parameter.
For example, use this 'packager' command template:
Code Block |
---|
h4. Exporting AIP Hierarchy To export an AIP hierarchy, use the {{\-a}} (or {{\--all}}) package parameter. For example, use this 'packager' command template: {code} /dspace/bin/dspace packager -d -a -t AIP -e <eperson> -i <handle> <file-path> {code} |
for
...
example:
Code Block |
---|
} /dspace/bin/dspace packager -d -a -t AIP -e admin@myu.edu -i 4321/4567 aip4567.zip {code} |
The
...
above
...
code
...
will
...
export
...
the
...
object
...
of
...
the
...
given
...
handle
...
(4321/4567)
...
into
...
an
...
AIP
...
file
...
named
...
"aip4567.zip".
...
In
...
addition
...
it
...
would
...
export
...
all
...
children
...
objects
...
to
...
the
...
same
...
directory
...
as
...
the
...
"aip4567.zip"
...
file.
...
The
...
child
...
AIP
...
files
...
are
...
all
...
named
...
using
...
the
...
following
...
format:
...
- File
...
- Name
...
- Format:
...
<Obj-Type>@<Handle-with-dashes>.zip
...
- e.g.
...
- COMMUNITY@123456789-1.zip,
...
- COLLECTION@123456789-2.zip,
...
- ITEM@123456789-200.zip
...
- This
...
- general
...
- file
...
- naming
...
- convention
...
- ensures
...
- that
...
- you
...
- can
...
- easily
...
- locate
...
- an
...
- object
...
- to
...
- restore
...
- by
...
- its
...
- name
...
- (assuming
...
- you
...
- know
...
- its
...
- Object
...
- Type
...
- and
...
- Handle).
...
- Alternatively,
...
- if
...
- object
...
- doesn't
...
- have
...
- a
...
- Handle,
...
- it
...
- uses
...
- this
...
- File
...
- Name
...
- Format:
...
<Obj-Type>@internal-id-<DSpace-ID>.zip
...
- (e.g.
...
- ITEM@internal-id-234.zip)
...
Exporting
...
Entire
...
Site
...
To
...
export
...
an
...
entire
...
DSpace
...
Site,
...
pass
...
the
...
packager
...
the
...
Handle
...
<site-handle-prefix>/0
...
.
...
For
...
example,
...
if
...
your
...
site
...
prefix
...
is
...
"4321",
...
you'd
...
run
...
a
...
command
...
similar
...
to
...
the
...
following:
Code Block |
---|
} /dspace/bin/dspace packager -d -a -t AIP -e admin@myu.edu -i 4321/0 sitewide-aip.zip {code} |
Again,
...
this
...
would
...
export
...
the
...
DSpace
...
Site
...
AIP
...
into
...
the
...
file
...
"sitewide-aip.zip",
...
and
...
export
...
AIPs
...
for
...
all
...
Communities,
...
Collections
...
and
...
Items
...
into
...
the
...
same
...
directory
...
as
...
the
...
Site
...
AIP.
...
Ingesting
...
/
...
Restoring
...
AIPs
...
Ingestion
...
Modes
...
&
...
Options
...
Ingestion
...
of
...
AIPs
...
is
...
a
...
bit
...
more
...
complex
...
than
...
Dissemination,
...
as
...
there
...
are
...
several
...
different
...
"modes"
...
available:
...
- Submit/Ingest
...
- Mode
...
- (
...
-s
...
- option,
...
- default)
...
- – submit
...
- AIP(s)
...
- to
...
- DSpace
...
- in
...
- order
...
- to
...
- create
...
- a
...
- new
...
- object(s)
...
- (i.e.
...
- AIP
...
- is
...
- treated
...
- like
...
- a
...
- SIP
...
- – Submission
...
- Information
...
- Package)
...
- Restore
...
- Mode
...
- (
...
-r
...
- option)
...
- – restore
...
- pre-existing
...
- object(s)
...
- in
...
- DSpace
...
- based
...
- on
...
- AIP(s).
...
- This
...
- also
...
- attempts
...
- to
...
- restore
...
- all
...
- handles
...
- and
...
- relationships
...
- (parent/child
...
- objects).
...
- This
...
- is
...
- a
...
- specialized
...
- type
...
- of
...
- "submit",
...
- where
...
- the
...
- object
...
- is
...
- created
...
- with
...
- a
...
- known
...
- Handle
...
- and
...
- known
...
- relationships.
...
- Replace
...
- Mode
...
- (
...
-r
...
-f
...
- option)
...
- – replace
...
- existing
...
- object(s)
...
- in
...
- DSpace
...
- based
...
- on
...
- AIP(s).
...
- This
...
- also
...
- attempts
...
- to
...
- restore
...
- all
...
- handles
...
- and
...
- relationships
...
- (parent/child
...
- objects).
...
- This
...
- is
...
- a
...
- specialized
...
- type
...
- of
...
- "restore"
...
- where
...
- the
...
- contents
...
- of
...
- existing
...
- object(s)
...
- is
...
- replaced
...
- by
...
- the
...
- contents
...
- in
...
- the
...
- AIP(s).
...
- By
...
- default,
...
- if
...
- a
...
- normal
...
- "restore"
...
- finds
...
- the
...
- object
...
- already
...
- exists,
...
- it
...
- will
...
- back
...
- out
...
- (i.e.
...
- rollback
...
- all
...
- changes)
...
- and
...
- report
...
- which
...
- object
...
- already
...
- exists.
...
Again,
...
like
...
export,
...
there
...
are
...
two
...
types
...
of
...
AIP
...
Ingestion
...
you
...
can
...
perform
...
(using
...
any
...
of
...
the
...
above
...
modes):
...
- Single
...
- AIP
...
- (default)
...
- -
...
- Ingests
...
- just
...
- an
...
- AIP
...
- describing
...
- a
...
- single
...
- DSpace
...
- object.
...
- So,
...
- if
...
- you
...
- ran
...
- it
...
- in
...
- this
...
- default
...
- mode
...
- for
...
- a
...
- Collection
...
- AIP,
...
- you'd
...
- just
...
- create
...
- a
...
- DSpace
...
- Collection
...
- from
...
- the
...
- AIP
...
- (but
...
- not
...
- ingest
...
- any
...
- of
...
- its
...
- child
...
- objects)
...
- Hierarchy
...
- of
...
- AIPs
...
- (by
...
- including
...
- the
...
-
...
-all
...
- or
...
-a
...
- option
...
- after
...
- the
...
- mode)
...
- -
...
- Ingests
...
- the
...
- requested
...
- AIP
...
- describing
...
- an
...
- object,
...
- plus
...
- the
...
- AIP
...
- for
...
- all
...
- child
...
- objects.
...
- Some
...
- examples
...
- follow:
...
- For
...
- a
...
- Site
...
- -
...
- this
...
- would
...
- ingest
...
- all
...
- Communities,
...
- Collections
...
- &
...
- Items
...
- based
...
- on
...
- the
...
- located
...
- AIP
...
- files
...
- For
...
- a
...
- Community
...
- -
...
- this
...
- would
...
- ingest
...
- that
...
- Community
...
- and
...
- all
...
- SubCommunities,
...
- Collections
...
- and
...
- Items
...
- based
...
- on
...
- the
...
- located
...
- AIP
...
- files
...
- For
...
- a
...
- Collection
...
- -
...
- this
...
- would
...
- ingest
...
- that
...
- Collection
...
- and
...
- all
...
- contained
...
- Items
...
- based
...
- on
...
- the
...
- located
...
- AIP
...
- files
...
- For
...
- an
...
- Item
...
- – this
...
- just
...
- ingest
...
- the
...
- Item
...
- (including
...
- all
...
- Bitstreams
...
- &
...
- Bundles)
...
- based
...
- on
...
- the
...
- AIP
...
- file.
...
The
...
difference
...
between
...
"Submit"
...
and
...
"Restore/Replace"
...
modes
...
It's
...
worth
...
understanding
...
the
...
primary
...
differences
...
between
...
a
...
Submission
...
(specified
...
by
...
-s
...
parameter)
...
and
...
a
...
Restore
...
(specified
...
by
...
-r
...
parameter).
...
- Submission
...
- Mode
...
- (
...
-s
...
- )
...
- -
...
- creates
...
- a
...
- new
...
- object
...
- (AIP
...
- is
...
- treated
...
- like
...
- a
...
- SIP)
...
- By
...
- default,
...
- a
...
- new
...
- Handle
...
- is
...
- always
...
- assigned
- However,
- assigned
...
- you
...
- can
...
- force
...
- it
...
- to
...
- use
...
- the
...
- handle
...
- specified
...
- in
...
- the
...
- AIP
...
- by
...
- specifying
...
-o
...
ignoreHandle=false
...
- as
...
- one
...
- of
...
- your
...
- parameters
...
- By
...
- default,
...
- a
...
- new
...
- Parent
...
- object
...
- must
...
- be
...
- specified
...
- (using
...
- the
...
-p
...
- parameter).
...
- This
...
- is
...
- the
...
- location
...
- where
...
- the
...
- new
...
- object
...
- will
...
- be
...
- created.
...
- However,
...
- you
...
- can
...
- force
...
- it
...
- to
...
- use
...
- the
...
- parent
...
- object
...
- specified
...
- in
...
- the
...
- AIP
...
- by
...
- specifiying
...
-o
...
ignoreParent=false
...
- as
...
- one
...
- of
...
- your
...
- parameters
...
- By
...
- default,
...
- will
...
- respect
...
- a
...
- Collection's
...
- Workflow
...
- process
...
- when
...
- you
...
- submit
...
- an
...
- Item
...
- to
...
- a
...
- Collection
...
- However,
...
- you
...
- can
...
- specifically
...
- skip
...
- any
...
- workflow
...
- approval
...
- processes
...
- by
...
- specifying
...
-w
...
- parameter.
...
- Always adds a new Deposit License to Items
- Always adds new DSpace System metadata to Items (includes new 'dc.date.accessioned',
...
- 'dc.date.available',
...
- 'dc.date.issued'
...
- and
...
- 'dc.description.provenance'
...
- entries)
...
- Restore
...
- /
...
- Replace
...
- Mode
...
- -
...
- restores
...
- a
...
- new
...
- object
...
- (as
...
- if
...
- from
...
- a
...
- backup)
...
- By
...
- default,
...
- the
...
- Handle
...
- specified
...
- in
...
- the
...
- AIP
...
- is
...
- restored
...
- However,
...
- for
...
- restores,
...
- you
...
- can
...
- force
...
- a
...
- new
...
- handle
...
- to
...
- be
...
- generated
...
- by
...
- specifying
...
-o
...
ignoreHandle=true
...
- as
...
- one
...
- of
...
- your
...
- parameters.
...
- (NOTE:
...
- Doesn't
...
- work
...
- for
...
- replace
...
- mode
...
- as
...
- the
...
- new
...
- object
...
- always
...
- retains
...
- the
...
- handle
...
- of
...
- the
...
- replaced
...
- object)
...
- By
...
- default,
...
- the
...
- object
...
- is
...
- restored
...
- under
...
- the
...
- Parent
...
- specified
...
- in
...
- the
...
- AIP
...
- However,
...
- for
...
- restores,
...
- you
...
- can
...
- force
...
- it
...
- to
...
- restore
...
- under
...
- a
...
- different
...
- parent
...
- object
...
- by
...
- using
...
- the
...
-p
...
- parameter.
...
- (NOTE:
...
- Doesn't
...
- work
...
- for
...
- replace
...
- mode,
...
- as
...
- the
...
- new
...
- object
...
- always
...
- retains
...
- the
...
- parent
...
- of
...
- the
...
- replaced
...
- object)
...
- Always skips any Collection workflow approval processes when restoring/replacing
...
- an
...
- Item
...
- in
...
- a
...
- Collection
- Never adds a new Deposit License to Items (rather it restores the previous deposit license, as long as it is stored in the AIP)
- Never adds new DSpace System metadata to Items (rather it just restores the metadata as specified in the AIP)
Submitting AIP(s) to create a new object
Submitting a Single AIP
Note |
---|
This option allows you to essentially use an AIP as a SIP (Submission Information Package). The default settings will create a new DSpace object (with a new handle and a new parent object, if specified) from your AIP. |
To ingest a single AIP and create a new DSpace object under a parent of your choice, specify the -p
(or --parent
) package parameter to the command. Also, note that you are running the packager
in -s
(submit) mode.
NOTE: This only ingests the single AIP specified. It does not ingest all children objects.
Code Block |
---|
** *Never* adds a new Deposit License to Items (rather it restores the previous deposit license, as long as it is stored in the AIP) ** *Never* adds new DSpace System metadata to Items (rather it just restores the metadata as specified in the AIP) h4. Submitting AIP(s) to create a new object h5. Submitting a Single AIP {note}This option allows you to essentially use an AIP as a SIP (Submission Information Package). The default settings will create a new DSpace object (with a new handle and a new parent object, if specified) from your AIP.{note} To ingest a single AIP and create a new DSpace object under a parent of your choice, specify the {{\-p}} (or {{\--parent}}) package parameter to the command. Also, note that you are running the {{packager}} in {{\-s}} (submit) mode. _NOTE:_ This only ingests the single AIP specified. It does *not* ingest all children objects. {code} /dspace/bin/dspace packager -s -t AIP -e <eperson> -p <parent-handle> <file-path> {code} |
If
...
you
...
leave
...
out
...
the
...
-p
...
parameter,
...
the
...
AIP
...
package
...
ingester
...
will
...
attempt
...
to
...
install
...
the
...
AIP
...
under
...
the
...
same
...
parent
...
it
...
had
...
before.
...
As
...
you
...
are
...
also
...
specifying
...
the
...
-s
...
(submit)
...
parameter,
...
the
...
packager
...
will
...
assume
...
you
...
want
...
a
...
new
...
Handle
...
to
...
be
...
assigned
...
(as
...
you
...
are
...
effectively
...
specifying
...
that
...
you
...
are
...
submitting
...
a
...
new
...
object).
...
If
...
you
...
want
...
the
...
object
...
to
...
retain
...
the
...
Handle
...
specified
...
in
...
the
...
AIP,
...
you
...
can
...
specify
...
the
...
-o
...
ignoreHandle=false
...
option
...
to
...
force
...
the
...
packager
...
to
...
not
...
ignore
...
the
...
Handle
...
specified
...
in
...
the
...
AIP.
Submitting an AIP Hierarchy
Note |
---|
This option allows you to essentially use a set of AIPs as SIPs (Submission Information Packages). The default settings will create a new DSpace object (with a new handle and a new parent object, if specified) from each AIP |
To ingest an AIP hierarchy from a directory of AIPs, use the -a
(or --all
) package parameter.
For example, use this 'packager' command template:
Code Block |
---|
h5. Submitting an AIP Hierarchy {note}This option allows you to essentially use a set of AIPs as SIPs (Submission Information Packages). The default settings will create a new DSpace object (with a new handle and a new parent object, if specified) from each AIP {note} To ingest an AIP hierarchy from a directory of AIPs, use the {{\-a}} (or {{\--all}}) package parameter. For example, use this 'packager' command template: {code} /dspace/bin/dspace packager -s -a -t AIP -e <eperson> -p <parent-handle> <file-path> {code} |
for
...
example:
Code Block |
---|
} /dspace/bin/dspace packager -s -a -t AIP -e admin@myu.edu -p 4321/12 aip4567.zip {code} |
The
...
above
...
command
...
will
...
ingest
...
the
...
package
...
named
...
"aip4567.zip"
...
as
...
a
...
child
...
of
...
the
...
specified
...
Parent
...
Object
...
(handle="4321/12").
...
The
...
resulting
...
object
...
is
...
assigned
...
a
...
new
...
Handle
...
(since
...
-s
...
is
...
specified).
...
In
...
addition,
...
any
...
child
...
AIPs
...
referenced
...
by
...
"aip4567.zip"
...
are
...
also
...
recursively
...
ingested
...
(a
...
new
...
Handle
...
is
...
also
...
assigned
...
for
...
each
...
child
...
AIP).
...
Another
...
example
...
– Ingesting
...
a
...
Top-Level
...
Community
...
(by
...
using
...
the
...
Site
...
Handle
...
, <site-handle-prefix>/0
...
):
Code Block |
---|
} /dspace/bin/dspace packager -s -a -t AIP -e admin@myu.edu -p 4321/0 community-aip.zip {code} |
The
...
above
...
command
...
will
...
ingest
...
the
...
package
...
named
...
"community-aip.zip"
...
as
...
a
...
top-level
...
community
...
(i.e.
...
the
...
specified
...
parent
...
is
...
"4321/0"
...
which
...
is
...
a
...
Site
...
Handle).
...
Again,
...
the
...
resulting
...
object
...
is
...
assigned
...
a
...
new
...
Handle.
...
In
...
addition,
...
any
...
child
...
AIPs
...
referenced
...
by
...
"community-aip.zip"
...
are
...
also
...
recursively
...
ingested
...
(a
...
new
...
Handle
...
is
...
also
...
assigned
...
for
...
each
...
child
...
AIP).
...
Restoring/Replacing
...
using
...
AIP(s)
...
Restoring
...
is
...
slightly
...
different
...
than
...
just
...
submitting
...
.
...
When
...
restoring,
...
we
...
make
...
every
...
attempt
...
to
...
restore
...
the
...
object
...
as
...
it
...
used
...
to
...
be
...
(including
...
its
...
handle,
...
parent
...
object,
...
etc.).
...
There
...
are
...
currently
...
three
...
restore
...
modes:
...
- Default
...
- Restore
...
- Mode
...
- (
...
-r
...
- )
...
- =
...
- Attempt
...
- to
...
- restore
...
- object
...
- (and
...
- optionally
...
- children).
...
- Rollback
...
- all
...
- changes
...
- if
...
- any
...
- object
...
- is
...
- found
...
- to
...
- already
...
- exist.
...
- Restore,
...
- Keep
...
- Existing
...
- Mode
...
- (
...
-r
...
-k
...
- )
...
- =
...
- Attempt
...
- to
...
- restore
...
- object
...
- (and
...
- optionally
...
- children).
...
- If
...
- an
...
- object
...
- is
...
- found
...
- to
...
- already
...
- exist,
...
- skip
...
- over
...
- it
...
- (and
...
- all
...
- children
...
- objects),
...
- and
...
- continue
...
- to
...
- restore
...
- all
...
- other
...
- non-existing
...
- objects.
...
- Force
...
- Replace
...
- Mode
...
- (
...
-r
...
-f
...
- )
...
- =
...
- Restore
...
- an
...
- object
...
- (and
...
- optionally
...
- children)
...
- and
...
- overwrite
...
- any
...
- existing
...
- objects
...
- in
...
- DSpace.
...
- Therefore,
...
- if
...
- an
...
- object
...
- is
...
- found
...
- to
...
- already
...
- exist
...
- in
...
- DSpace,
...
- its
...
- contents
...
- are
...
- replaced
...
- by
...
- the
...
- contents
...
- of
...
- the
...
- AIP.
...
- WARNING:
...
- This
...
- mode
...
- is
...
- potentially
...
- dangerous
...
- as
...
- it
...
- will
...
- permanently
...
- destroy
...
- any
...
- object
...
- contents
...
- that
...
- do
...
- not
...
- currently
...
- exist
...
- in
...
- the
...
- AIP.
...
- You
...
- may
...
- want
...
- to
...
- perform
...
- a
...
- secondary
...
- backup,
...
- unless
...
- you
...
- are
...
- sure
...
- you
...
- know
...
- what
...
- you
...
- are
...
- doing
...
- !
Info |
---|
Restoring a Single AIP: All of the below examples show how to restore an entire hierarchy of objects (using |
Default Restore Mode
By default, the restore mode (-r
option) will rollback all changes if any object is found to already exist. The user will be informed if which object already exists within their DSpace installation.
Use this 'packager' command template:
Code Block |
---|
_ {info}*Restoring a Single AIP:* All of the below examples show how to restore an entire hierarchy of objects (using {{-a}} option). To restore a single object, you can use the same commands, but remove the {{-a}} option.{info} h5. Default Restore Mode By default, the restore mode ({{\-r}} option) will rollback all changes if any object is found to already exist. The user will be informed if which object already exists within their DSpace installation. Use this 'packager' command template: {code} /dspace/bin/dspace packager -r -a -t AIP -e <eperson> <file-path> {code} |
For
...
example:
Code Block |
---|
} /dspace/bin/dspace packager -r -a -t AIP -e admin@myu.edu aip4567.zip {code} _Notice that unlike_ {{_\-s{_}}} _option (for |
Notice that unlike -s
option (for submission/ingesting),
...
the
...
-r
...
option
...
does
...
not
...
require
...
the
...
Parent
...
Object
...
(
...
-p
...
option)
...
to
...
be
...
specified
...
if
...
it
...
can
...
be
...
determined
...
from
...
the
...
package
...
itself.
...
In
...
the
...
above
...
example,
...
the
...
package
...
"aip4567.zip"
...
is
...
restored
...
to
...
the
...
DSpace
...
installation
...
with
...
the
...
Handle
...
provided
...
within
...
the
...
package
...
itself
...
(and
...
added
...
as
...
a
...
child
...
of
...
the
...
parent
...
object
...
specified
...
within
...
the
...
package
...
itself).
...
In
...
addition,
...
any
...
child
...
AIPs
...
referenced
...
by
...
"aip4567.zip"
...
are
...
also
...
recursively
...
ingested
...
(the
...
-a
...
option
...
specifies
...
to
...
also
...
restore
...
all
...
child
...
AIPs).
...
They
...
are
...
also
...
restored
...
with
...
the
...
Handles
...
&
...
Parent
...
Objects
...
provided
...
with
...
their
...
package.
...
If
...
any
...
object
...
is
...
found
...
to
...
already
...
exist,
...
all
...
changes
...
are
...
rolled
...
back
...
(i.e.
...
nothing
...
is
...
restored
...
to
...
DSpace)
...
Restore,
...
Keep
...
Existing
...
Mode
...
When
...
the
...
"Keep
...
Existing"
...
flag
...
(
...
-k
...
option)
...
is
...
specified,
...
the
...
restore
...
will
...
attempt
...
to
...
skip
...
over
...
any
...
objects
...
found
...
to
...
already
...
exist.
...
It
...
will
...
report
...
to
...
the
...
user
...
that
...
the
...
object
...
was
...
found
...
to
...
exist
...
(and
...
was
...
not
...
modified
...
or
...
changed).
...
It
...
will
...
then
...
continue
...
to
...
restore
...
all
...
objects
...
which
...
do
...
not
...
already
...
exist.
...
One
...
special
...
case
...
to
...
note:
...
If
...
a
...
Collection
...
or
...
Community
...
is
...
found
...
to
...
already
...
exist,
...
its
...
child
...
objects
...
are
...
also
...
skipped
...
over.
...
So,
...
this
...
mode
...
will
...
not
...
auto-restore
...
items
...
to
...
an
...
existing
...
Collection.
...
Use
...
this
...
'packager'
...
command
...
template:
Code Block |
---|
} /dspace/bin/dspace packager -r -a -k -t AIP -e <eperson> <file-path> {code} |
For
...
example:
Code Block |
---|
} /dspace/bin/dspace packager -r -a -k -t AIP -e admin@myu.edu aip4567.zip {code} |
In
...
the
...
above
...
example,
...
the
...
package
...
"aip4567.zip"
...
is
...
restored
...
to
...
the
...
DSpace
...
installation
...
with
...
the
...
Handle
...
provided
...
within
...
the
...
package
...
itself
...
(and
...
added
...
as
...
a
...
child
...
of
...
the
...
parent
...
object
...
specified
...
within
...
the
...
package
...
itself).
...
In
...
addition,
...
any
...
child
...
AIPs
...
referenced
...
by
...
"aip4567.zip"
...
are
...
also
...
recursively
...
restored
...
(the
...
-a
...
option
...
specifies
...
to
...
also
...
restore
...
all
...
child
...
AIPs).
...
They
...
are
...
also
...
restored
...
with
...
the
...
Handles
...
&
...
Parent
...
Objects
...
provided
...
with
...
their
...
package.
...
If
...
any
...
object
...
is
...
found
...
to
...
already
...
exist,
...
it
...
is
...
skipped
...
over
...
(child
...
objects
...
are
...
also
...
skipped).
...
All
...
non-existing
...
objects
...
are
...
restored.
...
Force
...
Replace
...
Mode
...
When
...
the
...
"Force
...
Replace"
...
flag
...
(
...
-f
...
option)
...
is
...
specified,
...
the
...
restore
...
will
...
overwrite
...
any
...
objects
...
found
...
to
...
already
...
exist
...
in
...
DSpace.
...
In
...
other
...
words,
...
existing
...
content
...
is
...
deleted
...
and
...
then
...
replaced
...
by
...
the
...
contents
...
of
...
the
...
AIP(s).
Panel |
---|
} WARNING: Because this mode actually *destroys *existing content in DSpace, it is potentially dangerous and may result in data loss \! It is recommended to always perform a secondary full backup (assetstore files & database) before attempting to replace any existing object(s) in DSpace. {panel} {panel}SECOND |
Panel |
---|
SECOND WARNING: This doesn't 100% work yet for an entire Site \! You've been warned \! \! \! - Tim {panel} |
Use
...
this
...
'packager'
...
command
...
template:
Code Block |
---|
} /dspace/bin/dspace packager -r -a -f -t AIP -e <eperson> <file-path> {code} |
For
...
example:
Code Block |
---|
} /dspace/bin/dspace packager -r -a -f -t AIP -e admin@myu.edu aip4567.zip {code} |
In
...
the
...
above
...
example,
...
the
...
package
...
"aip4567.zip"
...
is
...
restored
...
to
...
the
...
DSpace
...
installation
...
with
...
the
...
Handle
...
provided
...
within
...
the
...
package
...
itself
...
(and
...
added
...
as
...
a
...
child
...
of
...
the
...
parent
...
object
...
specified
...
within
...
the
...
package
...
itself).
...
In
...
addition,
...
any
...
child
...
AIPs
...
referenced
...
by
...
"aip4567.zip"
...
are
...
also
...
recursively
...
ingested.
...
They
...
are
...
also
...
restored
...
with
...
the
...
Handles
...
&
...
Parent
...
Objects
...
provided
...
with
...
their
...
package.
...
If
...
any
...
object
...
is
...
found
...
to
...
already
...
exist,
...
its
...
contents
...
are
...
replaced
...
by
...
the
...
contents
...
of
...
the
...
appropriate
...
AIP.
...
If
...
any
...
error
...
occurs,
...
the
...
script
...
attempts
...
to
...
rollback
...
the
...
entire
...
replacement
...
process.
...
Restoring
...
Entire
...
Site
...
Details
...
Coming
...
Soon
...
!
...
In
...
all
...
likelihood
...
it
...
will
...
take
...
the
...
same
...
parameters
...
as
...
the
...
"Exporting
...
entire
...
Site",
...
except
...
that
...
you'll
...
be
...
running
...
the
...
packager
...
in
...
-r
...
(restore)
...
mode.
...
Configuration in 'dspace.cfg'
...
The
...
following
...
new
...
configurations
...
relate
...
to
...
AIPs:
...
AIP
...
Metadata
...
Dissemination
...
Configurations
...
The
...
following
...
configurations
...
allow
...
you
...
to
...
specify
...
what
...
metadata
...
is
...
stored
...
within
...
each
...
METS-based
...
AIP.
...
In
...
'dspace.cfg
...
',
...
the
...
general
...
format
...
for
...
each
...
of
...
these
...
settings
...
is:
...
Wiki Markup {{aip.disseminate.<setting> = <mdType>:<DSpace-crosswalk-name> \[, ...\]}}
...
- <setting>
...
- is
...
- the
...
- setting
...
- name
...
- (see
...
- below
...
- for
...
- the
...
- full
...
- list
...
- of
...
- valid
...
- settings)
...
- <mdType>
...
- is
...
- optional.
...
- It
...
- allows
...
- you
...
- to
...
- specify
...
- the
...
- value
...
- of
...
- the
...
- @MDTYPE
...
- or
...
- @OTHERMDTYPE
...
- attribute
...
- in
...
- the
...
- corresponding
...
- METS
...
- element.
...
- <DSpace-crosswalk-name>
...
- is
...
- required.
...
- It
...
- specifies
...
- the
...
- name
...
- of
...
- the
...
- DSpace
...
- Crosswalk
...
- which
...
- should
...
- be
...
- used
...
- to
...
- generate
...
- this
...
- metadata.
...
- Zero
...
- or
...
- more
...
<label-for-METS>:<DSpace-crosswalk-name>
...
- may
...
- be
...
- specified
...
- for
...
- each
...
- setting
Warning |
---|
} It is recommended to *minimally *use the default settings when generating AIPs. DSpace can only restore information that is included within an AIP. Therefore, if you choose to no longer include some information in an AIP, DSpace will no longer be able to restore that information from an AIP backup {warning} The default settings in {backup |
The default settings in 'dspace.cfg
...
' are:
...
aip.disseminate.techMD
...
- -
...
- Lists
...
- the
...
- DSpace
...
- Crosswalks
...
- (by
...
- name)
...
- which
...
- should
...
- be
...
- called
...
- to
...
- populate
...
- the
<techMD>
section of the METS file within the AIP (Default:
...
- PREMIS)
...
- The
...
- PREMIS
...
- Crosswalk
...
- generates
...
- PREMIS
...
- metadata
...
- for
...
- the
...
- object
...
- specified
...
- by
...
- the
...
- AIP
...
aip.disseminate.sourceMD
...
- -
...
- Lists
...
- the
...
- DSpace
...
- Crosswalks
...
- (by
...
- name)
...
- which
...
- should
...
- be
...
- called
...
- to
...
- populate
...
- the
...
<sourceMD>
...
- section
...
- of
...
- the
...
- METS
...
- file
...
- within
...
- the
...
- AIP
...
- (Default:
...
- AIP-TECHMD)
...
- The
...
- AIP-TECHMD
...
- Crosswalk
...
- generates
...
- technical
...
- metadata
...
- (in
...
- DIM
...
- format)
...
- for
...
- the
...
- object
...
- specified
...
- by
...
- the
...
- AIP
...
aip.disseminate.digiprovMD
...
- -
...
- Lists
...
- the
...
- DSpace
...
- Crosswalks
...
- (by
...
- name)
...
- which
...
- should
...
- be
...
- called
...
- to
...
- populate
...
- the
...
<digiprovMD>
...
- section
...
- of
...
- the
...
- METS
...
- file
...
- within
...
- the
...
- AIP
...
- (Default:
...
- None
...
- )
...
aip.disseminate.rightsMD
...
- -
...
- Lists
...
- the
...
- DSpace
...
- Crosswalks
...
- (by
...
- name)
...
- which
...
- should
...
- be
...
- called
...
- to
...
- populate
...
- the
...
<rightsMD>
...
- section
...
- of
...
- the
...
- METS
...
- file
...
- within
...
- the
...
- AIP
...
- (Default:
...
- DSpaceDepositLicense:DSPACE_DEPLICENSE,
...
- CreativeCommonsRDF:DSPACE_CCRDF,
...
- CreativeCommonsText:DSPACE_CCTEXT)
...
- The
...
- DSPACE_DEPLICENSE
...
- crosswalk
...
- ensures
...
- the
...
- DSpace
...
- Deposit
...
- License
...
- is
...
- referenced/stored
...
- in
...
- AIP
...
- The
...
- DSPACE_CCRDF
...
- crosswalk
...
- ensures
...
- any
...
- Creative
...
- Commons
...
- RDF
...
- Licenses
...
- are
...
- reference/stored
...
- in
...
- AIP
...
- The
...
- DSPACE_CCTEXT
...
- crosswalk
...
- ensures
...
- any
...
- Creative
...
- Commons
...
- Textual
...
- Licenses
...
- are
...
- referenced/stored
...
- in
...
- AIP
...
aip.disseminate.dmd
...
- -
...
- Lists
...
- the
...
- DSpace
...
- Crosswalks
...
- (by
...
- name)
...
- which
...
- should
...
- be
...
- called
...
- to
...
- populate
...
- the
...
<dmdSec>
...
- section
...
- of
...
- the
...
- METS
...
- file
...
- within
...
- the
...
- AIP
...
- (Default:
...
- MODS,
...
- DIM)
...
- The
...
- MODS
...
- crosswalk
...
- translates
...
- the
...
- DSpace
...
- descriptive
...
- metadata
...
- (for
...
- this
...
- object)
...
- into
...
- MODS.
...
- As
...
- MODS
...
- is
...
- a
...
- relatively
...
- "standard"
...
- metadata
...
- schema,
...
- it
...
- may
...
- be
...
- useful
...
- to
...
- include
...
- a
...
- copy
...
- of
...
- MODS
...
- metadata
...
- in
...
- your
...
- AIPs
...
- if
...
- you
...
- should
...
- ever
...
- want
...
- to
...
- import
...
- them
...
- into
...
- another
...
- (non-DSpace)
...
- system.
...
- The
...
- DIM
...
- crosswalk
...
- just
...
- translates
...
- the
...
- DSpace
...
- internal
...
- descriptive
...
- metadata
...
- into
...
- an
...
- XML
...
- format.
...
- This
...
- XML
...
- format
...
- is
...
- proprietary
...
- to
...
- DSpace,
...
- but
...
- stores
...
- the
...
- metadata
...
- in
...
- a
...
- format
...
- similar
...
- to
...
- Qualified
...
- Dublin
...
- Core.
...
AIP
...
Ingestion
...
Metadata
...
Crosswalk
...
Configurations
...
The
...
following
...
configurations
...
allow
...
you
...
to
...
specify
...
what
...
DSpace
...
Crosswalks
...
are
...
used
...
during
...
the
...
ingestion/restoration
...
of
...
AIPs.
...
These
...
configurations
...
also
...
allow
...
you
...
to
...
ignore
...
areas
...
of
...
the
...
METS
...
file
...
(in
...
the
...
AIP)
...
if
...
you
...
do
...
not
...
want
...
that
...
area
...
to
...
be
...
restored.
...
In
...
dspace.cfg
...
,
...
the
...
general
...
format
...
for
...
each
...
of
...
these
...
settings
...
is:
...
mets.dspaceAIP.ingest.crosswalk.<mdType>
...
=
...
<DSpace-crosswalk-name>
...
- <mdType> is the type of metadata as specified in the METS file. This corresponds to the value of the @MDTYPE attribute (of that metadata section in the METS). When the @MDTYPE attribute is "OTHER", then the <mdType> corresponds to the @OTHERMDTYPE attribute value.
- <DSpace-crosswalk-name>
...
- specifies
...
- the
...
- name
...
- of
...
- the
...
- DSpace
...
- Crosswalk
...
- which
...
- should
...
- be
...
- used
...
- to
...
- ingest
...
- this
...
- metadata
...
- into
...
- DSpace.
...
- You
...
- can
...
- specify
...
- the
...
- "NULLSTREAM"
...
- crosswalk
...
- if
...
- you
...
- specifically
...
- want
...
- this
...
- metadata
...
- to
...
- be
...
- ignored
...
- (and
...
- skipped
...
- over
...
- during
...
- ingestion).
...
By
...
default,
...
the
...
settings
...
in
...
dspace.cfg
...
are:
Code Block |
---|
} mets.dspaceAIP.ingest.crosswalk.DSpaceDepositLicense = NULLSTREAM mets.dspaceAIP.ingest.crosswalk.CreativeCommonsRDF = NULLSTREAM mets.dspaceAIP.ingest.crosswalk.CreativeCommonsText = NULLSTREAM {code} |
The
...
above
...
settings
...
tell
...
the
...
ingester
...
to
...
ignore
...
any
...
metadata
...
sections
...
which
...
reference
...
DSpace
...
Deposit
...
Licenses
...
or
...
Creative
...
Commons
...
Licenses.
...
These
...
metadata
...
sections
...
can
...
be
...
safely
...
ignored
...
as
...
long
...
as
...
the
...
"LICENSE"
...
and
...
"CC_LICENSE"
...
bundles
...
are
...
included
...
in
...
AIPs
...
(which
...
is
...
the
...
default
...
setting).
...
As
...
the
...
Licenses
...
are
...
included
...
in
...
those
...
Bundles,
...
they
...
will
...
already
...
be
...
restored
...
when
...
restoring
...
the
...
bundle
...
contents.
...
AIP
...
Ingestion
...
EPerson
...
Configurations
...
The
...
following
...
setting
...
determines
...
whether
...
the
...
AIP
...
Ingester
...
should
...
create
...
an
...
EPerson
...
(if
...
necessary)
...
when
...
attempting
...
to
...
restore
...
or
...
ingest
...
an
...
Item
...
whose
...
Submitter
...
cannot
...
be
...
located
...
in
...
the
...
system.
...
By
...
default
...
it
...
is
...
set
...
to
...
"false"
...
mets.dspaceAIP.ingest.createSubmitter
...
=
...
false
AIP Configurations To Improve Ingestion Speed while Validating
It is recommended to validate all AIPs on ingestion (when possible). But validation can be extremely slow, as each validation request first must download all references Schema documents from the web. In order to perform validations in a speedy fashion, you can pull down a local copy of all schemas. Validation will then use this local cache, which can sometimes increase the speed up to 10X.
To use a local cache of XML schemas when validating, use the following settings in 'dspace.cfg'.
...
The
...
general
...
format
...
is:
...
mets.xsd.<abbreviation>
...
=
...
<namespace>
...
<local-file-name>
...
<abbreviation>
is a unique abbreviation (of your choice) for this schema<namespace>
is the Schema namespaceWiki Markup {{<local-file-name>}} the full name of the cached schema file (which should reside in your {{\[dspace\]/config/schemas/}} directory)
...
Wiki Markup |
---|
The default settings are all commented out. But, they provide a full listing of all schemas currently used during validation of AIPs. In order to utilize them, uncomment the settings, download the appropriate schema file, and save it to your {{\[dspace\]/config/schemas/}} using the specified file name: |
Code Block |
---|
} #mets.xsd.mets = http://www.loc.gov/METS/ mets.xsd #mets.xsd.xlink = http://www.w3.org/1999/xlink xlink.xsd #mets.xsd.mods = http://www.loc.gov/mods/v3 mods.xsd #mets.xsd.xml = http://www.w3.org/XML/1998/namespace xml.xsd #mets.xsd.dc = http://purl.org/dc/elements/1.1/ dc.xsd #mets.xsd.dcterms = http://purl.org/dc/terms/ dcterms.xsd #mets.xsd.premis = http://www.loc.gov/standards/premis PREMIS.xsd #mets.xsd.premisObject = http://www.loc.gov/standards/premis PREMIS-Object.xsd #mets.xsd.premisEvent = http://www.loc.gov/standards/premis PREMIS-Event.xsd #mets.xsd.premisAgent = http://www.loc.gov/standards/premis PREMIS-Agent.xsd #mets.xsd.premisRights = http://www.loc.gov/standards/premis PREMIS-Rights.xsd {code} h2. |
To-Do
...
List
...
– What
...
remains
...
to
...
be
...
done!
...
Testing
...
Special
...
Cases
...
during
...
Restore/Replace
...
The
...
below
...
special
...
cases
...
need
...
further
...
testing,
...
especially
...
when
...
performing
...
a
...
"Restore"
...
or
...
"Replace".
...
Mostly,
...
these
...
are
...
just
...
notes
...
for
...
Tim
...
(and
...
other
...
developers),
...
to
...
ensure
...
that
...
all
...
these
...
various
...
"edge"
...
cases
...
can
...
be
...
restored
...
properly
...
(or
...
perhaps
...
not
...
restored
...
properly,
...
if
...
the
...
decision
...
is
...
made
...
that
...
it
...
needs
...
not
...
be
...
restored).
...
As
...
each
...
special
...
case
...
is
...
implemented,
...
we
...
can
...
check
...
off
...
the
...
item
...
in
...
the
...
below
...
list.
...
Special
...
cases
...
which
...
have
...
been
...
fully
...
tested
...
&
...
implemented
...
are
...
marked
...
with
...
a
...
. Feel free to add more special cases to this listing, if we missed anything.
Item Restoration/Replacement
Special Cases
- Restore existing Deposit License from AIP – i.e. do not add a new license (or change the license) during restore/replace
- Restore existing CC License(s)
- Restore item mappings to multiple collections (for items which are mapped to several collections)
- Restore withdrawal state
- Restore embargo state
- Restore permissions & roles (user/group permissions), if possible
- Options to restore just metadata or just particular bitstreams/bundles?
- Will not restore items which have not made it into the "archived" state. In other words, at this time, there are no plans to restore items which are still in an approval workflow (WorkflowItems) or items which are unfinished submissions (WorkspaceItems). WorkspaceItems and WorkflowItems are never exported as AIPs.
Collection Restoration/Replacement
Special Cases
- Restore permissions & roles (user/group permissions), if possible
- Restore Workflow approval groups
- Restore Collection-specific license
- Restore Collection's Item Template?
- Restore Collection's content source info? (e.g. OAI-Harvesting Collections versus normal Collections)
Community Restoration/Replacement
Special Cases
- Restore permissions & roles (user/group permissions), if possible
Admin UI work
As part of the CurationTaskProposal (led by Richard Rodgers & MIT), a new Curation Framework is in the works. This Curation Framework will have a Command Line interface initially. However, the goal for 1.7, is to also have Administrative UI tools which are able to kick off various "curation tools". Among these curation tools will be the ability to export/import AIPs via the Admin UI.
Notes on AIP ingest speed & improving it
Some very basic ingestion speed tests were performed on a set of 26 AIPs (which represented a Community containing a Collection containing 24 Items). These tests found that, by default, the parsing/ingest settings are currently not optimized for speed.
Here are the basic (non-scientific) results
- Default Settings (validates all METS files using external Schemas): took about 1 minute, 12 seconds to ingest all 26 AIPs
- Locally cached all schemas (with validation turned on): took about 12 seconds to ingest all 26 AIPs
- You can locally cache all schemas by using the
mets.xsd.*
settings indspace.cfg
- You can locally cache all schemas by using the
- No validation (
-o validate=false
flag): took about 11 seconds to ingest all 26 AIPs
Discussion / Use Cases
Please add your own potential use cases or discussion topics
- DuraCloud DSpace Interaction Notes - Notes/Discussion on how DSpace and DuraCloud may need to interact more directly. This page is specific to DuraCloud Use Cases.
- AIP Export Implementation Notes - Notes/Discussion on this specific AIP Backup/Restore Implementation (not specific to DuraCloud).
- MIT Use Cases - Notes on defining common operations in a replication system.
Questions / Comments?
Questions or comments – either add them inline above, or contact Tim Donohue