Page History
...
Source Code:
The Replication Task Suite source code is available at: http://scm.dspace.org/svn/repo/modules/dspace-replicate/
In addition, there is an associated JIRA Issue at: https://jira.duraspace.org/browse/DS-876
...
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Prerequisites
Must be installed on a DSpace 1.8.x System
Warning | ||
---|---|---|
| ||
DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug will be resolved in DSpace 1.8.1. |
Because of enhancements to the Curation System in DSpace 1.8.0, the Replication Suite is only compatible with a DSpace 1.8.x System.
User Interface Compatibility Notes
As the Replication Suite is just a suite of Curation System tasks, it may be called (like any Curation Tasks) from the following locations:
- From the Command Line
- From the Admin UI (XMLUI Only)
- From Approval Workflow
- From custom Java code
For more information see the Curation System details on Task Invocation.
Installation
Note | ||
---|---|---|
| ||
These instructions are still a work in progress. |
Maven-based Installation
Wiki Markup In your DSpace Source directory ({{\[dspace-src\]}}), you will modify two Maven {{pom.xml}} files:
Wiki Markup {{\[dspace-src\]/dspace/pom.xml}} (This POM controls dependencies of CommandLine scripts. Modifying it will let you run {{dspace-replicate}} from commandline)
Wiki Markup {{\[dspace-src\]/dspace/modules/xmlui/pom.xml}} (This POM controls dependencies of XMLUI. Modifying it will let you run {{dspace-replicate}} from XMLUI)
- For both of these pom.xml files, add the following
<dependency>
section at the end of the existing<dependencies>
section (just before the closing</dependencies>
tag):Code Block <dependency> <groupId>org.dspace</groupId> <artifactId>dspace-replicate</artifactId> <version>1.0-EA</version> </dependency>
Wiki Markup Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your {{\[dspace-src\]/dspace/}} folder:
Code Block mvn clean package
Wiki Markup You will need to update your existing DSpace 1.8.x installation, by running the following from your {{\[dspace-src\]/dspace/target/dspace-1.8.x-SNAPSHOT-build/}} directory
Code Block ant update
Note Alternatively, if you don't want to do a full update, you can just update your existing binaries & webapps by running the following two commands:
Wiki Markup {{ant update_code}} (Updates the existing \[dspace\]/lib/ directory)
Wiki Markup {{ant update_webapps}} (Updates the existing \[dspace\]/webapp/ directory)
- Copy the Replication Suite's configuration files to your DSpace configuration directory
Wiki Markup *Replication Suite Configuration File:* Copy {{\[dspace-replicate\]/config/modules/replicate.cfg}} to your {{\[dspace\]/config/modules/}} directory
Wiki Markup *METS-specific AIP Configuration Settings:* Copy {{\[dspace-replicate\]/config/modules/replicate-mets.cfg}} to your {{\[dspace\]/config/modules/}} directory
Wiki Markup *DuraCloud Configuration File:* Copy {{\[dspace-replicate\]/config/modules/duracloud.cfg}} to your {{\[dspace\]/config/modules/}} directory
- Finally, follow the Configuration settings instructions below to configure the Replication Suite based on your usage needs.
Wiki Markup There is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which can be used as a reference. It is pre-configured to use the [DSpace AIP Format|DSDOC18:DSpace AIP Format] (METS-based packaging).
Manual Installation
- Download the Replication Suite code
- "Early Access" release is available via SVN at: http://scm.dspace.org/svn/repo/modules/dspace-replicate/tags/dspace-replicate-1.0-EA/
- Experimental Trunk code is available via SVN at: http://scm.dspace.org/svn/repo/modules/dspace-replicate/
- Build/Compile the Replication Suite, by running the following from the root directory
Code Block mvn package
- Copy the generated JAR files to your DSpace 1.8 installation.
Wiki Markup There are a total of 5 JARs that will need to be copied to your {{\[dspace\]/lib/}}
Wiki Markup {{\[dspace-replicate\]/target/dspace-replicate-\[version\].jar}} (The Replication Suite Plugin)
Wiki Markup {{\[dspace-replicate\]/target/lib/common-\[version\].jar}} (DuraCloud common libraries - required for DuraCloud integration)
Wiki Markup {{\[dspace-replicate\]/target/lib/commons-compress-\[version\].jar}} (Apache Commons Compress - prerequisite for Replication Suite plugin)
Wiki Markup {{\[dspace-replicate\]/target/lib/storageprovider-\[version\].jar}} (DuraCloud storage provider libraries - required for DuraCloud integration)
Wiki Markup {{\[dspace-replicate\]/target/lib/storeclient-\[version\].jar}} (DuraCloud store client libraries - required for DuraCloud integration)
Wiki Markup Also, copy the above 5 JARs also to your XMLUI web application's WEB-INF/lib directory (e.g. {{\[dspace\]/webapps/xmlui/WEB-INF/lib/}})
- Copy the Replication Suite's configuration files to your DSpace configuration directory
Wiki Markup *Replication Suite Configuration File:* Copy {{\[dspace-replicate\]/config/modules/replicate.cfg}} to your {{\[dspace\]/config/modules/}} directory
Wiki Markup *METS-specific AIP Configuration Settings:* Copy {{\[dspace-replicate\]/config/modules/replicate-mets.cfg}} to your {{\[dspace\]/config/modules/}} directory
Wiki Markup *DuraCloud Configuration File:* Copy {{\[dspace-replicate\]/config/modules/duracloud.cfg}} to your {{\[dspace\]/config/modules/}} directory
- Finally, follow the Configuration settings instructions below to configure the Replication Suite based on your usage needs.
Wiki Markup There is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which can be used as a reference. It is pre-configured to use the [DSpace AIP Format|DSDOC18:DSpace AIP Format] (METS-based packaging).
Configuration
Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.
Before getting started, you may wish to determine the answers to the following questions:
- #AIP Format Options: Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt AIP Format?
- #Storage Options: Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
- #Additional Options: Do you plan to use Checkm manifests for checksum auditing?
AIP Format Options
One of the first questions to ask yourself is the format you wish to utilize for your AIPs.
There are two options:
- DSpace AIP Format (METS-based) (default) - This is the same AIP format utilized by the DSpace AIP Backup and Restore feature, so it is 100% compatible with that existing feature. In fact when using this format, the Replication Task Suite just "wraps" calls to the AIP Backup and Restore feature itself.
- BagIt AIP Format - This is a new AIP format provided by the Replication Task Suite. It generates AIPs in the BagIt File Packaging Format. Institutions which already are familiar with BagIt or use it elsewhere may find this format preferrable.
Configuring usage of DSpace default AIP Format (METS-based)
This section goes through the steps of configuring the Replication Suite to use the default DSpace AIP format, which utilizes METS packaging.
Wiki Markup *General Curation Configuration:* First, in your {{\[dspace\]/config/modules/curate.cfg}} you will want to enable & configure the METS-based replication tasks. (NOTE: there is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which is pre-configured to use METS-based AIPs).
- Enable the Replication Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTask = \ ... (YOUR EXISTING TASKS) ... , \ org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \ org.dspace.ctask.replicate.ReadOdometer = readodometer, \ org.dspace.ctask.replicate.TransmitAIP = transmitaip, \ org.dspace.ctask.replicate.VerifyAIP = verifyaip, \ org.dspace.ctask.replicate.FetchAIP = fetchaip, \ org.dspace.ctask.replicate.CompareWithAIP = auditaip, \ org.dspace.ctask.replicate.RemoveAIP = removeaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
- Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block ui.tasknames = \ ... (YOUR EXISTING TASK NAMES) ... , \ estaipsize = Estimate AIP(s) Size, \ readodometer = Read Odometer, \ transmitaip = Transmit AIP(s) to Storage, \ verifyaip = Verify AIP(s) exist in Storage, \ fetchaip = Fetch AIP(s) from Storage, \ auditaip = Audit/Compare against AIP(s), \ removeaip = Remove AIP(s) from Storage, \ restorefromaip = Restore Missing Object(s) from AIP(s), \ replacewithaip = Replace Existing Object(s) with AIP(s), \ restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\ restoresinglefromaip = Restore Single Object from AIP, \ replacesinglewithaip = Replace Single Object with AIP
- Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups = \ general = General Purpose Tasks, replicate = Replication Suite Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip
- Enable the Replication Tasks: In the list of "Task Class implementations" (
Wiki Markup *Replication Suite Configuration*: Next, in your {{\[dspace\]/config/modules/replicate.cfg}} you will want to ensure it is setup to properly use METS-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:
Code Block # Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtype = mets # Format of package compression. Permitted values: 'zip' or 'tgz' # for 'mets' packages, only 'zip' is supported packer.archfmt = zip # Whether or not the name packages with a DSpace type prefix. # When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip) # When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip) # Defaults to 'true'. For 'mets' packages, this must be 'true'. packer.typeprefix = true
Wiki Markup *Optionally tweak the AIP Restore/Replace settings:* Optionally, you can decide to tweak the way AIPs are restored or replaced (using [DSDOC18:AIP Backup and Restore]). These settings normally *should not need to be tweaked*, but are available in the {{\[dspace\]/config/modules/replicate-mets.cfg}} configuration file. See that configuration file for more details.
Configuring usage of DSpace BagIt AIP Format
This section goes through the steps of configuring the Replication Suite to use BagIt-based AIPs. For more information on the BagIt packaging format, see: https://wiki.ucop.edu/display/Curation/BagIt
Wiki Markup *General Curation Configuration:* First, in your {{\[dspace\]/config/modules/curate.cfg}} you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which provides example settings).
- Enable the Replication Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTask = \ ... (YOUR EXISTING TASKS) ... , \ org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \ org.dspace.ctask.replicate.ReadOdometer = readodometer, \ org.dspace.ctask.replicate.TransmitAIP = transmitaip, \ org.dspace.ctask.replicate.VerifyAIP = verifyaip, \ org.dspace.ctask.replicate.FetchAIP = fetchaip, \ org.dspace.ctask.replicate.CompareWithAIP = auditaip, \ org.dspace.ctask.replicate.RemoveAIP = removeaip, \ org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \ org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
- Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block ui.tasknames = \ ... (YOUR EXISTING TASK NAMES) ... , \ estaipsize = Estimate AIP(s) Size, \ readodometer = Read Odometer, \ transmitaip = Transmit AIP(s) to Storage, \ verifyaip = Verify AIP(s) exist in Storage, \ fetchaip = Fetch AIP(s) from Storage, \ auditaip = Audit/Compare against AIP(s), \ removeaip = Remove AIP(s) from Storage, \ restorefromaip = Restore Missing Object(s) from AIP(s), \ replacewithaip = Replace Existing Object(s) with AIP(s)
- Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups = \ general = General Purpose Tasks, replicate = Replication Suite Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
- Enable the Replication Tasks: In the list of "Task Class implementations" (
Wiki Markup *Replication Suite Configuration*: Next, in your {{\[dspace\]/config/modules/replicate.cfg}} you will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:
Code Block # Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtype = bagit
Storage Options
Where your AIPs will be stored is the next decision to make. There are three options currently available:
- Local Storage: Replicate/Backup content to another location (folder) on your local filesystem.
- Mountable Storage: Replicate/Backup content to a mounted external filesystem (e.g. NFS-mounted drive).
- DuraCloud Storage: Replicate/Backup content to an existing DuraCloud account.
Configuring Local Storage
Info |
---|
The local storage option may also be used for a mounted drive / SAN which just appears as though it is a local filesystem folder. However, some mounted drives (e.g. NFS-mounted drives) may need to use the Mountable Storage option instead. |
...
Before configuring a local storage option, please ensure you have enough space available on your local hard drive (or mounted drive/SAN if your local folder is actually remote storage). You can use the "Estimate Storage Space" (estaipsize) task to estimate the amount of new storage space you will need.
...