Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
outlinetrue
stylenone
Panelnote

This code is now available on the current DSpace SVN Trunk (http://scm.dspace.org/svn/repo/dspace/trunk/). It will be officially released as part of DSpace 1.7.0.

...

Note

Additional background information available in the OR10 Presentation entitled Improving DSpace Backups, Restores & Migrations

This work comes out of a requirement for DSpace integration with DuraCloud (http://www.duracloud.org). One of these requirements is to be able to essentially "backup" local DSpace contents into the cloud (as a type of offsite backup), and "restore" those contents at a later time.

Essentially, we 'd like need a way to be able to export the entire hierarchy (i.e. bitstreams, metadata and relationships between Communities/Collections/Items) into a relatively standard format (e.g. METS or similar structured packaging format). This entire hierarchy should also be able to be re-imported into DSpace in the same format, to allow for "roundtrippinground-tripping" of that content (essentially a restore of that content in the same or different DSpace installation).

...

  • Would allow folks to more easily move entire Communities or Collections between DSpace instances.
  • Would allow for a potentially more consistent backup of this hierarchy (e.g. to DuraCloud, or just to your own local backup system), rather than relying on synchronizing a backup of your DB (metadata/relationships) and assetstore (bitstreams).
  • Would provide a way for people to more easily get their data out of DSpace (whatever the purpose may be).
  • Would provide a relatively standard format for people to migrate entire hierarchies (Communities/Collections) into DSpace (from another system).

Known Issues:

  • Exporting/Importing the Community/Collection/Item hierarchy technically doesn't cover all the "content" held in DSpace. There are also Groups, EPeople and permissions/rights (which would get you closer to a full export/import of all DSpace content). However, concentrating on just the hierarchy of Community/Collection/Item seems like a good first step.

This is related to (and a partial subset of) MIT's AipPrototype: http://jira.dspace.org/jira/browse/DS-465 However, the This is related to (and a partial subset of) MIT's AipPrototype. However, the original AIP prototype did not make it very easy to re-import the exported AIPs for Communities or Collections. So, this prototype extends on the old AIP prototype's packagers/crosswalks to allow for an full export and import of an entire DSpace hierarchy, or just a set of Communities, Collections or Items.

How does this work help DSpace interact with DuraCloud?

In this initial prototype, this This work is entirely about exporting DSpace content objects to a location on a local filesystem. So, this work doesn't interact solely with DuraCloud, and could be used by any backup storage system to backup your DSpace contents.

...

For more specific details of AIP format / structure, along with examples, please see DSpaceAIPFormat

Where to get the Code

There is an SVN sandbox area for this work (so that others can help out, if it interests them). If anyone has comments, suggestions or feedback on this idea, or would like to be involved in this project, definitely let me know (or add comments to this wiki page).The latest code is available on DSpace Trunk (and will be released in DSpace 1.7.0)

Code Block
 svn co http://scm.dspace.org/svn/repo/sandbox/aip-external-1_6-prototypedspace/trunk/ 

What code has really changed?

...

  1. org.dspace.content.packager.* - Packager classes
    • PackageIngester interface - Now ingests 'java.io.File' objects instead of InputStreams (to better support recursive imports of Communities/Collections)
    • PackageDisseminator interface - Now exports 'java.io.File' objects instead of OutputStreams (to better support recursive exports of Communities/Collections)
    • DSpaceAIPDisseminator - Disseminates/Exports AIP(s)
    • DSpaceAIPIngester - Ingests exported AIP(s)\
    • Changes were also made to refactor / enhance the AbstractMETSDisseminator, AbstractMETSIngester, and METSManifest classes
  2. org.dspace.content.crosswalk.*
    • AIPDIMCrosswalk - Crosswalks DIM metadata for AIPs
    • AIPTechMDCrosswalk - Crosswalks METS TechMD sections for AIPs
    • There were also changes to the MODSDisseminationCrosswalk and XSLTDisseminationCrosswalk to support creating "Site" AIPs
Note

For a full list of code changes (including patches) see: AipCoreAPIChanges

Install Prototype

  1. Download the code from the SVN Sandbox (see above).
  2. Build & Install the prototype. This is just a modified version of DSpace 1.6.0 – so, follow the normal DSpace 1.6.0 Installation procedure.
    • If you have a DSpace 1.6.0 instance already running, you can just build the code and point it at your existing DSpace 1.6.0 database & assetstore.

...

Warning
Wiki Markup
Because of the changes to the {PackageIngester} and {PackageDisseminator} interfaces, you will need to refactor any custom, local Packagers you've created for your institution.

Running the Code

Exporting AIPs

...