Page History
...
Info | ||
---|---|---|
| ||
Based on the version of DSpace you are running, here are the compatible latest releases of the Replication Task Suite:
|
Note | ||
---|---|---|
| ||
For a quick overview of the various tasks offered in the Replication Task Suite, along with some real-life scenarios / examples of where each Replication task may come in handy, you may wish to skip directly to the 81953514 Problem Statement and Usage Examples section at the bottom of this page. |
...
Replication Task Suite Version | Supported DSpace Version(s) | Supported Java Version | Supported Interfaces | Notes | |
---|---|---|---|---|---|
7.6.1 | DSpace verxion version 7.6.x | Java 8 11 or above | XMLUI and/or commandline | DSpace 7.6.x UI or command line | The 7.6 The 6.1 stable version of the Replication Task Suite offers no new functionality over the previous versions. It is simply a refactor of the code to ensure that Replication Task Suite works with DSpace 7.6.x. |
56.01 | DSpace version 56.x | Java 8 or above | XMLUI and/or commandline | The 56.0 1 stable version of the Replication Task Suite offers no new functionality over the previous versions. It is simply a refactor of the code to ensure that Replication Task Suite works with DSpace 56.x.3 | |
5.50 | DSpace version 35.x | Java 8 or above | XMLUI and/or commandline | The 5.0 stable version of the Replication Task Suite offers no new functionality over the previous versions. It is simply a refactor of the code to ensure that Replication Task Suite works with DSpace 5.x. | |
3.5 | DSpace version 3.x or 4.x | Java 8 or above | XMLUI and/or commandline | The 3.5 stable version of the Replication Task Suite is nearly identical to the 1.x stable version. It just includes minor bug fixes to ensure the Replication Task Suite is compatible with the newer DSpace APIs. | |
1.3 | DSpace version 1.8.x | Java 6 or above | XMLUI and/or commandline | Highly recommended to use either DSpace 1.8.1 or above. DSpace 1.8.0 has a known bug where running a Replication Task will always return a NullPointerException - see DS-1077 |
Installation instructions for each version are included below:
- Installation on DSpace 7.x
- Installation on DSpace 6.x
- Installation
- 81953514
- ReplicationTaskSuite#Installation on DSpace 5.x
- ReplicationTaskSuite#Installation Installation on DSpace 3.x or 4.x
- 81953514Installation on DSpace 1.8.x
User Interface Compatibility Notes
...
- From the Command Line
- From the Admin UI (XMLUI OnlyIn DSpace 7.x or XMLUI in DSpace through 6.x)
- From Item Approval Workflow
- From custom Java code
For more information see the Curation System details on Task Invocation.
Installation on DSpace
...
7.x
Installation in the DSpace 7.x server (backend)
- In your DSpace Source directory (
[dspace-src]
), you will need to modify the following POM file:[dspace-src]/dspace/modules/additions/pom.xml
(This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)
For this pom.xml file, add the following
<dependency>
section at the end of the existing<dependencies>
section (just before the closing</dependencies>
tag). NOTE: the exclusions are required to work around DS-3536differences in DSpace and DuraCloud dependency versions.Code Block <dependencies> ... <!-- Adding this dependency will install the Replication Task Suite Addon --> <dependency> <groupId>org.dspace</groupId> <artifactId>dspace-replicate</artifactId> <version>6<version>7.1<6</version> <exclusions> <!-- These exclusions are currently necessary to<exclusion> resolve dependency mismatches with some dependencies pulled into RTS 6.0 to work with DuraCloud, see DS-3536 for details --><groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-core</artifactId> <exclusions></exclusion> <exclusion> <groupId>com.amazonaws</groupId> <exclusion> <artifactId>aws-java-sdk-sqs</artifactId> </exclusion> <exclusion> <groupId>org.apache.commons</groupId> <artifactId>commons-compress</artifactId> </exclusion> <artifactId>commons-lang3</artifactId> <exclusion> </exclusion><groupId>org.hibernate.javax.persistence</groupId> <artifactId>hibernate-jpa-2.1-api</artifactId> <exclusion> </exclusion> <exclusion> <groupId>com.amazonaws<<groupId>org.apache.httpcomponents</groupId> <artifactId>httpmime</artifactId> </exclusion> <artifactId>aws-java-sdk-core</artifactId> </exclusion> <exclusion> <groupId>org.apachespringframework.httpcomponents<security</groupId> <artifactId>httpmime<<artifactId>spring-security-core</artifactId> </exclusion> </exclusions> <exclusion> <groupId>org.springframework</groupId> <artifactId>spring-expression</artifactId> </exclusion> <exclusion> <groupId>org.springframework.security</groupId> <artifactId>spring-security-core</artifactId> </exclusion> <exclusion> <groupId>org.codehaus.jackson</groupId> <artifactId>jackson-mapper-asl</artifactId> </exclusion> <exclusion> <groupId>org.codehaus.jackson</groupId> <artifactId>jackson-core-asl</artifactId> </exclusion> </exclusions> </dependency> </dependencies>
Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your
[dspace-src]/dspace/
folder:Code Block mvn clean package
Update the default dspace.cfg to include the Replication Task Suite config files. This ensures these configs are loaded as part of your DSpace configuration. This also allows you to override the configurations in your own local.cfg file.
Code Block include = ${module_dir}/replicate.cfg include = ${module_dir}/replicate-mets.cfg
- You should ensure these configurations exist in your
[dspace-src]/dspace/config/
directory. That way they will be auto-installed/copied whenever you run "ant update" (see next step).
- You should ensure these configurations exist in your
- Follow the instructions in the 81953514 section below in order to enable & configure the Replication Task Suite Add-On.
ant update_code
(Updates the existing[dspace]/lib/
directory)ant update_webapps
(Updates the existing[dspace]/webapp/
directory)
You will need to update your existing DSpace 3.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/
directory
Code Block |
---|
ant update
|
Note |
---|
Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands: |
Installation on DSpace 5.x
- Follow the instructions for deployment on DSpace 6.x above, substituting version 5.0 of the dspace-replicate dependency.
Installation on DSpace 3.x or 4.x
...
[dspace-src]/dspace/modules/additions/pom.xml
(This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)
For this pom.xml file, add the following <dependency>
section at the end of the existing <dependencies>
section (just before the closing </dependencies>
tag).
Code Block |
---|
<dependencies>
...
<!-- Adding this dependency will install the Replication Task Suite Addon -->
<dependency>
<groupId>org.dspace</groupId>
<artifactId>dspace-replicate</artifactId>
<version>3.4</version>
</dependency>
</dependencies> |
</dependency> </dependencies>
Once you've finished modifying the pom.xml file, rebuild DSpace by running the following from your
[dspace-src]/dspace/
folder:Code Block mvn clean package
Update the default dspace.cfg to include the Replication Task Suite config files. This ensures these configs are loaded as part of your DSpace configuration. This also allows you to override the configurations in your own local.cfg file. Including the duracloud.cfg file is only required if you are planning to replicate/backup your content to DuraCloud.
Code Block include = ${module_dir}/replicate.cfg include = ${module_dir}/replicate-mets.cfg include = ${module_dir}/replicate-bagit.cfg include = ${module_dir}/duracloud.cfg
- Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
Update your existing DSpace installation by running the following from your
[dspace-src]/dspace/target/dspace-[version]-build/
directoryCode Block ant update
Note Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:
ant update_code
(Updates the existing[dspace]/lib/
directory)ant update_webapps
(Updates the existing[dspace]/webapp/
directory)
Installation in the DSpace 7.x UI
In the DSpace 7.x UI, you will need to specify labels for the RTS tasks (so that descriptive names are displayed in the Curation Task list in the UI.) You can either add these directly to [dspace-angular]/src/assets/i18n/en.json5 or include them in the en.json5 file in your theme directory and execute the merge-i18n script. If your DSpace site supports languages other than English, you'll need to add these (and appropriate translations) to each language file available to users.
Code Block "curation-task.task.estaipsize.label": "Estimate Storage Space for AIP(s)", "curation-task.task.readodometer.label": "Read Odometer", "curation-task.task.transmitaip.label": "Transmit AIP(s) to Storage", "curation-task.task.transmitsingleaip.label": "Transmit Single Object AIP to Storage", "curation-task.task.verifyaip.label": "Verify AIP(s) exist in Storage", "curation-task.task.fetchaip.label": "Fetch AIP(s) from Storage", "curation-task.task.auditaip.label": "Audit against AIP(s)", "curation-task.task.removeaip.label": "Remove AIP(s) from Storage", "curation-task.task.restorefromaip.label": "Restore Missing Object(s) from AIP(s)", "curation-task.task.replacewithaip.label": "Replace Existing Object(s) with AIP(s)", "curation-task.task.restorekeepexisting.label": "Restore Missing Object(s) but Keep Existing Objects", "curation-task.task.restoresinglefromaip.label": "Restore Single Object from AIP", "curation-task.task.replacesinglewithaip.label": "Replace Single Object with AIP",
Installation on DSpace 6.x
- In your DSpace Source directory (
[dspace-src]
), you will need to modify the following POM file:[dspace-src]/dspace/modules/additions/pom.xml
(This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)
For this pom.xml file, add the following
<dependency>
section at the end of the existing<dependencies>
section (just before the closing</dependencies>
tag). NOTE: the exclusions are required to work around DS-3536.Code Block <dependencies> ... <!-- Adding this dependency will install the Replication Task Suite Addon --> <dependency> <groupId>org.dspace</groupId> <artifactId>dspace-replicate</artifactId> <version>6.1</version> <!-- These exclusions are currently necessary to resolve dependency mismatches with some dependencies pulled into RTS 6.0 to work with DuraCloud, see DS-3536 for details --> <exclusions> <exclusion> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> </exclusion> <exclusion> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-core</artifactId> </exclusion> <exclusion> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpmime</artifactId> </exclusion> <exclusion> <groupId>org.springframework</groupId> <artifactId>spring-expression</artifactId> </exclusion> <exclusion> <groupId>org.springframework.security</groupId> <artifactId>spring-security-core</artifactId> </exclusion> <exclusion> <groupId>org.codehaus.jackson</groupId> <artifactId>jackson-mapper-asl</artifactId> </exclusion> <exclusion> <groupId>org.codehaus.jackson</groupId> <artifactId>jackson-core-asl</artifactId> </exclusion> </exclusions> </dependency> </dependencies>
Once you've finished modifying the pom.xml file, rebuild DSpace by running the following from your
[dspace-src]/dspace/
folder:Code Block mvn clean package
Update the default dspace.cfg to include the Replication Task Suite config files. This ensures these configs are loaded as part of your DSpace configuration. This also allows you to override the configurations in your own local.cfg file. Including the duracloud.cfg file is only required if you are planning to replicate/backup your content to DuraCloud.
Code Block include = ${module_dir}/replicate.cfg include = ${module_dir}/replicate-mets.cfg include = ${module_dir}/replicate-bagit.cfg include = ${module_dir}/duracloud.cfg
- You should ensure these configurations exist in your
[dspace-src]/dspace/config/modules
directory. That way they will be auto-installed/copied whenever you run "ant update" (see next step).
- You should ensure these configurations exist in your
- Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
You will need to update your existing DSpace 3.x installation, by running the following from your
[dspace-src]/dspace/target/dspace-[version]-build/
directoryCode Block ant update
Note Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:
ant update_code
(Updates the existing[dspace]/lib/
directory)ant update_webapps
(Updates the existing[dspace]/webapp/
directory)
Installation on DSpace 5.x
- Follow the instructions for deployment on DSpace 6.x above, substituting version 5.0 of the dspace-replicate dependency.
Installation on DSpace 3.x or 4.x
- In your DSpace Source directory (
[dspace-src]
), you will need to modify the following POM file:[dspace-src]/dspace/modules/additions/pom.xml
(This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)
For this pom.xml file, add the following
<dependency>
section at the end of the existing<dependencies>
section (just before the closing</dependencies>
tag).Code Block <dependencies> ... <!-- Adding this dependency will install the Replication Task Suite Addon --> <dependency> <groupId>org.dspace</groupId> <artifactId>dspace-replicate</artifactId> <version>3.4</
Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/
folder:
Code Block |
---|
mvn clean package
|
...
- You may wish to ensure these configurations exist in your
[dspace-src]/dspace/config/
directory. That way they will be auto-installed/copied whenever you run "ant update" (see next step).
You will need to update your existing DSpace 3.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/
directory
Code Block |
---|
ant update
|
Note |
---|
Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:
|
Installation on DSpace 1.8.x
Warning | ||
---|---|---|
| ||
DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug was resolved in DSpace 1.8.1. |
- In your DSpace Source directory (
[dspace-src]
), you will modify two Mavenpom.xml
files:[dspace-src]/dspace/pom.xml
(This POM controls dependencies of CommandLine scripts. Modifying it will let you rundspace-replicate
from commandline)[dspace-src]/dspace/modules/xmlui/pom.xml
(This POM controls dependencies of XMLUI. Modifying it will let you rundspace-replicate
from XMLUI)
For each of these pom.xml files, add the following
<dependency>
section at the end of the existing<dependencies>
section (just before the closing</dependencies>
tag).Code Block <dependencies> ... <!-- Adding this dependency will install the Replication Task Suite Addon --> <dependency> <groupId>org.dspace</groupId> <artifactId>dspace-replicate</artifactId> <version>1.3</version> </dependency> </dependencies>
Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your
[dspace-src]/dspace/
folder:Code Block mvn clean package
- Follow the instructions in the 81953514 Configuration section below in order to enable & configure the Replication Task Suite Add-On.
- You may wish to ensure these configurations exist in your
[dspace-src]/dspace/config/
directory. That way they will be auto-installed/copied whenever you run "ant update" (see next step).
- You may wish to ensure these configurations exist in your
You will need to update your existing DSpace 13.8.x installation, by running the following from your
[dspace-src]/dspace/target/dspace-[version]-build/
directoryCode Block ant update
Note Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:
ant update_code
(Updates the existing[dspace]/lib/
directory)ant update_webapps
(Updates the existing[dspace]/webapp/
directory)
Upgrades
Upgrading the Replication Task Suite to a new version essentially involves a reinstallation of the add-on.
Follow the latest installation instructions, based on the version of DSpace you are running:
Once you have reinstalled the Replication Task Suite, you should compare your existing configurations with the latest Replication Task Suite configurations. In most cases, your existing configurations should function perfectly, but you should review the differences just in case.
Configuration
Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.
Enabling Replication Task Suite
In order to enable the Replication Task Suite, you need to create / edit several configuration files.
- A copy of all configuration files utilized by the Replication Task Suite (RTS) can be found in the following locations:
- Configs for RTS version 1.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-1_x/config/modules
- Configs for RTS version 3.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-3_x/config/modules
- Configs for RTS version 6.x : https://github.com/DSpace/dspace-replicate/tree/master/config/modules
- Copy the following configuration files to your DSpace's
[dspace]/config/modules/
directory:
replicate.cfg
- This file contains the base settings for the Replication Task Suitereplicate-mets.cfg
- This file provides a few additional replication options specific to METS-based AIPs (see below for more details)duracloud.cfg
- If you'd like to replicate/backup your content to DuraCloud, this file holds your DuraCloud account information
- Edit your
[dspace]/config/modules/curate.cfg
configuration file to define & enable all tasks. The list of tasks to add to this configuration file depends on which type of AIP (METS based or BagIt based) you wish to use. Please see the 81953514 section below for the details of what should be added to your curate.cfg file- A sample, fully enabled
curate.cfg
configuration file is provided alongside the other Replication Task Suite config files listed above. This sample file is preconfigured to use METS-based AIPs.
- A sample, fully enabled
- Recommended (but not required): Edit your
[dspace]/config/modules/dspace.cfg
and enable the Replication Task Suite 'listener' to perform automatic synchronization of your AIP backup store with what is in DSpace (see Automation Options for more info).
Overview of Configuration Options
Before getting started, you may wish to determine the answers to the following questions:
- 81953514: Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt AIP Format?
- 81953514: Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
- 81953514: Do you want to automatically sync your AIP backup store with what is in DSpace? (this is highly recommended, but not required)
- 81953514: Do you plan to use Checkm manifests for checksum auditing?
Info | ||
---|---|---|
| ||
For a higher level introduction to the Replication Task Suite, please see the 81953514 section below. It may provide you with a better idea of how you'd like to configure this task suite based on your institutional needs. |
AIP Format Options
One of the first questions to ask yourself is the format you wish to utilize for your AIPs.
There are two options:
- DSpace AIP Format (METS-based) (default) - This is the same AIP format utilized by the DSpace AIP Backup and Restore feature, so it is 100% compatible with that DSpace feature. In fact when using this format, the Replication Task Suite just "wraps" calls to the AIP Backup and Restore feature itself.
- BagIt AIP Format (beta) - This is a new AIP format provided by the Replication Task Suite. It generates AIPs in the BagIt File Packaging Format. Institutions which already are familiar with BagIt or use it elsewhere may find this format preferable. (Please note that this AIP format does not yet support all DSpace objects. See the below table for more information.)
These two AIP formats are not identical. The below table seeks to describe some of the differences.
...
DSpace AIP Format (METS-based AIPs)
...
BagIt AIP Format
...
Supported Backup/Restore Types
...
Can Backup & Restore all DSpace Content easily
...
Yes
...
Yes
...
Can Backup & Restore a Single Community/Collection/Item easily
...
Yes
...
Yes
...
Backups can be used to move one or more Community/Collection/Items to another DSpace system easily.
...
Yes (Using the Replication Task Suite or using the command line AIP Backup and Restore tools)
...
Yes (though the Replication Task Suite add-on must be installed on both systems)
...
Supported DSpace Object Types
...
No (AIPs are only generated for objects which are completed and considered "in archive")
...
No (AIPs are only generated for objects which are completed and considered "in archive")
...
Installation on DSpace 1.8.x
Warning | ||
---|---|---|
| ||
DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug was resolved in DSpace 1.8.1. |
- In your DSpace Source directory (
[dspace-src]
), you will modify two Mavenpom.xml
files:[dspace-src]/dspace/pom.xml
(This POM controls dependencies of CommandLine scripts. Modifying it will let you rundspace-replicate
from commandline)[dspace-src]/dspace/modules/xmlui/pom.xml
(This POM controls dependencies of XMLUI. Modifying it will let you rundspace-replicate
from XMLUI)
For each of these pom.xml files, add the following
<dependency>
section at the end of the existing<dependencies>
section (just before the closing</dependencies>
tag).Code Block <dependencies> ... <!-- Adding this dependency will install the Replication Task Suite Addon --> <dependency> <groupId>org.dspace</groupId> <artifactId>dspace-replicate</artifactId> <version>1.3</version> </dependency> </dependencies>
Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your
[dspace-src]/dspace/
folder:Code Block mvn clean package
- Follow the instructions in the Configuration section below in order to enable & configure the Replication Task Suite Add-On.
- You may wish to ensure these configurations exist in your
[dspace-src]/dspace/config/
directory. That way they will be auto-installed/copied whenever you run "ant update" (see next step).
- You may wish to ensure these configurations exist in your
You will need to update your existing DSpace 1.8.x installation, by running the following from your
[dspace-src]/dspace/target/dspace-
directory[version]
-build/Code Block ant update
Note Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:
ant update_code
(Updates the existing[dspace]/lib/
directory)ant update_webapps
(Updates the existing[dspace]/webapp/
directory)
Upgrades
Upgrading the Replication Task Suite to a new version essentially involves a reinstallation of the add-on.
Follow the latest installation instructions, based on the version of DSpace you are running:
Once you have reinstalled the Replication Task Suite, you should compare your existing configurations with the latest Replication Task Suite configurations. In most cases, your existing configurations should function perfectly, but you should review the differences just in case.
Configuration
Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.
Enabling Replication Task Suite
In order to enable the Replication Task Suite, you need to create / edit several configuration files.
- A copy of all configuration files utilized by the Replication Task Suite (RTS) can be found in the following locations:
- Configs for RTS version 1.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-1_x/config/modules
- Configs for RTS version 3.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-3_x/config/modules
- Configs for RTS version 5.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-5_x/config/modules
- Configs for RTS version 6.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-6_x/config/modules
- Configs for RTS version 7.x : https://github.com/DSpace/dspace-replicate/tree/master/config/modules
- Copy the following configuration files to your DSpace's
[dspace]/config/modules/
directory:
replicate.cfg
- This file contains the base settings for the Replication Task Suitereplicate-mets.cfg
- This file provides a few additional replication options specific to METS-based AIPs (see below for more details)replicate-bagit.cfg
- This file provides additional configuration for BagIt AIPs (see below for more details)duracloud.cfg
- If you'd like to replicate/backup your content to DuraCloud, this file holds your DuraCloud account information
- Edit your
[dspace]/config/modules/curate.cfg
configuration file to define & enable all tasks. The list of tasks to add to this configuration file depends on which type of AIP (METS based or BagIt based) you wish to use. Please see the AIP Format Options section below for the details of what should be added to your curate.cfg file- A sample, fully enabled
curate.cfg
configuration file is provided alongside the other Replication Task Suite config files listed above. This sample file is preconfigured to use METS-based AIPs.
- A sample, fully enabled
- Recommended (but not required): Edit your
[dspace]/config/modules/dspace.cfg
and enable the Replication Task Suite 'listener' to perform automatic synchronization of your AIP backup store with what is in DSpace (see Automation Options for more info).
Overview of Configuration Options
Before getting started, you may wish to determine the answers to the following questions:
- AIP Format Options: Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt AIP Format?
- Storage Options: Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
- Automation Options (Recommended): Do you want to automatically sync your AIP backup store with what is in DSpace? (this is highly recommended, but not required)
- Additional Options: Do you plan to use Checkm manifests for checksum auditing?
Info | ||
---|---|---|
| ||
For a higher level introduction to the Replication Task Suite, please see the Problem Statement and Usage Examples section below. It may provide you with a better idea of how you'd like to configure this task suite based on your institutional needs. |
AIP Format Options
One of the first questions to ask yourself is the format you wish to utilize for your AIPs.
There are two options:
- DSpace AIP Format (METS-based) (default) - This is the same AIP format utilized by the DSpace AIP Backup and Restore feature, so it is 100% compatible with that DSpace feature. In fact when using this format, the Replication Task Suite just "wraps" calls to the AIP Backup and Restore feature itself.
- BagIt AIP Format (beta) - This is a new AIP format provided by the Replication Task Suite. It generates AIPs in the BagIt File Packaging Format. Institutions which already are familiar with BagIt or use it elsewhere may find this format preferable. (Please note that this AIP format does not yet support all DSpace objects. See the below table for more information.)
These two AIP formats are not identical. The below table seeks to describe some of the differences.
DSpace AIP Format (METS-based AIPs) | BagIt AIP Format | |
Supported Backup/Restore Types | ||
---|---|---|
Can Backup & Restore all DSpace Content easily | Yes | Yes |
Can Backup & Restore a Single Community/Collection/Item easily | Yes | Yes |
Backups can be used to move one or more Community/Collection/Items to another DSpace system easily. | Yes (Using the Replication Task Suite or using the command line AIP Backup and Restore tools) | Yes (though the Replication Task Suite add-on must be installed on both systems) |
Can Backup & Restore Item Versions (added in DSpace 3.x) | No (Item Versioning not yet compatible with AIP format. Only the most recent version of an Item is described in the AIP.) | No (Item Versioning not yet compatible with AIP format. Only the most recent version of an Item is described in the AIP.) |
Supported DSpace Object Types | ||
Supports backup/restore of all Communities/Collections/Items (including metadata, files, logos, etc.) | Yes | Yes |
Supports backup/restore of all People/Groups/Permissions | Yes | Yes |
Supports backup/restore of all Collection-specific Item Templates | Yes | No (Not yet supported) |
Supports backup/restore of all Collection Harvesting settings (only for Collections which pull in all Items via OAI-PMH or OAI-ORE) | No (The harvest settings are not preserved, but previously harvested items are preserved in their own AIPs) | No (The harvest settings are not preserved, but previously harvested items are preserved in their own AIPs) |
Supports backup/restore of all Withdrawn (but not deleted) Items | Yes | Yes |
Supports backup/restore of Item Mappings between Collections | Yes | Yes |
Supports backup/restore of all in-process, uncompleted Submissions (or those currently in an approval workflow) | No (AIPs are only generated for objects which are completed and considered "in archive") | No (AIPs are only generated for objects which are completed and considered "in archive") |
Supports backup/restore of Items using custom Metadata Schemas & Fields | Yes | Yes |
Supports backup/restore of all local DSpace Configurations and Customizations | No (You are expected to backup your DSpace configurations and customizations separately. AIPs only backup content held within DSpace.) | No (You are expected to backup your DSpace configurations and customizations separately. AIPs only backup content held within DSpace.) |
For more information on the tasks available based on your AIP format choice, please see the Problem Statement and Usage Examples section below. This section also provides good examples of how to use each of the tasks available to you in the Replication Task Suite.
Configuring usage of DSpace default AIP Format (METS-based)
This section goes through the steps of configuring the Replication Suite to use the default DSpace AIP format, which utilizes METS packaging. This is the default & recommended setting.
- General Curation Configuration: First, in your
[dspace]/config/modules/curate.cfg
you will want to enable & configure the METS-based replication tasks. (NOTE: there is a samplecurate.cfg
file provided in https://github.com/DSpace/dspace-replicate/tree/master/config/modules which is pre-configured to use METS-based AIPs).Enable the Replication Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTask = \ ... (YOUR EXISTING TASKS) ... , \ org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \ org.dspace.ctask.replicate.ReadOdometer = readodometer, \ org.dspace.ctask.replicate.TransmitAIP = transmitaip, \ org.dspace.ctask.replicate.TransmitSingleAIP = transmitsingleaip, \ org.dspace.ctask.replicate.VerifyAIP = verifyaip, \ org.dspace.ctask.replicate.FetchAIP = fetchaip, \ org.dspace.ctask.replicate.CompareWithAIP = auditaip, \ org.dspace.ctask.replicate.RemoveAIP = removeaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
(Only for RTS versions prior to 7.0) Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them
For more information on the tasks available based on your AIP format choice, please see the 81953514 section below. This section also provides good examples of how to use each of the tasks available to you in the Replication Task Suite.
Configuring usage of DSpace default AIP Format (METS-based)
This section goes through the steps of configuring the Replication Suite to use the default DSpace AIP format, which utilizes METS packaging. This is the default & recommended setting.
- General Curation Configuration: First, in your
[dspace]/config/modules/curate.cfg
you will want to enable & configure the METS-based replication tasks. (NOTE: there is a samplecurate.cfg
file provided in https://github.com/DSpace/dspace-replicate/tree/master/config/modules which is pre-configured to use METS-based AIPs).Enable the Replication Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTaskui.tasknames = \ ... (YOUR EXISTING TASKSTASK NAMES) ... , \ org.dspace.ctask.replicate.EstimateAIPSize = estaipsizeestaipsize = Estimate Storage Space for AIP(s), \ readodometer = Read Odometer, \ transmitaip = Transmit AIP(s) to Storage, \ verifyaip = Verify AIP(s) exist in Storage, \ fetchaip = Fetch AIP(s) from Storage, \ org.dspace.ctask.replicate.ReadOdometerauditaip = readodometer Audit against AIP(s), \ org.dspace.ctask.replicate.TransmitAIPremoveaip = transmitaipRemove AIP(s) from Storage, \ org.dspace.ctask.replicate.TransmitSingleAIP = transmitsingleaiprestorefromaip = Restore Missing Object(s) from AIP(s), \ org.dspace.ctask.replicate.VerifyAIP replacewithaip = verifyaipReplace Existing Object(s) with AIP(s), \ org.dspace.ctask.replicate.FetchAIP = fetchaip, restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\ org.dspace.ctask.replicate.CompareWithAIPrestoresinglefromaip = auditaip Restore Single Object from AIP, \ org.dspace.ctask.replicate.RemoveAIPreplacesinglewithaip = removeaip,Replace \ Single Object org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting,with AIP
(Only for RTS versions prior to 7.0) Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups = \ org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaipgeneral = General Purpose Tasks, \ org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block ui.tasknames = \ ... (YOUR EXISTING TASK NAMES) ... , \ estaipsize = Estimate Storage Space for AIP(s), \ readodometer = Read Odometer, \ transmitaip = Transmit AIP(s) to Storage, \ verifyaip = Verify AIP(s) exist in Storage, \ fetchaip = Fetch AIP(s) from Storage, \ auditaip = Audit against AIP(s), \ removeaip = Remove AIP(s) from Storage, \ restorefromaip = Restore Missing Object(s) from AIP(s), \ replacewithaip = Replace Existing Object(s) with AIP(s), \ restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\ restoresinglefromaip = Restore Single Object from AIP, \ replacesinglewithaip = Replace Single Object with AIP
Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups = \ general = General Purpose Tasks, \ replicate = Replication Suite Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip
Replication Suite Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip
Replication Suite Configuration: Next, in your
[dspace]/config/modules/replicate.cfg
you will want to ensure it is setup to properly use METS-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:Code Block # Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtype = mets # Format of package compression. Permitted values: 'zip' or 'tgz' # for 'mets' packages, only 'zip' is supported packer.archfmt = zip # Whether or not the name packages with a DSpace type prefix. # When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip) # When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip) # Defaults to 'true'. For 'mets' packages, this must be 'true'. packer.typeprefix = true
- Optionally tweak the AIP Restore/Replace settings: Optionally, you can decide to tweak the way AIPs are restored or replaced (using AIP Backup and Restore options). These settings normally should not need to be tweaked, but are available in the
[dspace]/config/modules/replicate-mets.cfg
configuration file. See that configuration file for more details.
Configuring usage of DSpace BagIt AIP Format
This section goes through the steps of configuring the Replication Suite to use BagIt-based AIPs. The Replication Suite uses the BagIt Profiles specification in order to provide additional guarantees about the BagIt AIPs which are exported and ingested. The following profiles are supported:
BagIt Profile Identifier | Profile Information | Profile |
---|---|---|
aptrust | https://github.com/APTrust/bagit-profiles | https://github.com/duraspace/bagit-support/blob/master/src/main/resources/profiles/aptrust.json |
beyondtherepository | https://github.com/dpscollaborative/btr_bagit_profile | https://github.com/duraspace/bagit-support/blob/master/src/main/resources/profiles/beyondtherepository.json |
If no BagIt Profile is specified the beyondtherepository
profile will be used by default. For more information on the BagIt packaging format, see: https://wiki.ucop.edu/display/Curation/BagIt; the BagIt Profiles implementation used is DuraSpace's bagit-support.
- General Curation Configuration: First, in your
[dspace]/config/modules/
replicatecurate.cfg
you will want to ensure it is setup to properly use METS-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:Code Block # Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtype = mets # Format of package compression. Permitted values: 'zip' or 'tgz' # for 'mets' packages, only 'zip' is supported packer.archfmt = zip # Whether or not the name packages with a DSpace type prefix. # When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip) # When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip) # Defaults to 'true'. For 'mets' packages, this must be 'true'. packer.typeprefix = true
- Optionally tweak the AIP Restore/Replace settings: Optionally, you can decide to tweak the way AIPs are restored or replaced (using AIP Backup and Restore options). These settings normally should not need to be tweaked, but are available in the
[dspace]/config/modules/replicate-mets.cfg
configuration file. See that configuration file for more details.
Configuring usage of DSpace BagIt AIP Format
This section goes through the steps of configuring the Replication Suite to use BagIt-based AIPs. For more information on the BagIt packaging format, see: https://wiki.ucop.edu/display/Curation/BagIt
- enable & configure the BagIt-based replication tasks. (NOTE: there is a sample
curate.cfg
file provided in https://github.com/DSpace/dspace-replicate/tree/master/config/modules which provides example settings, though they are all commented out by default).Enable the Replication Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTask = \ ... (YOUR EXISTING TASKS) ... , \ org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \ org.dspace.ctask.replicate.ReadOdometer = readodometer, \ org.dspace.ctask.replicate.TransmitAIP = transmitaip, \ org.dspace.ctask.replicate.VerifyAIP = verifyaip, \ org.dspace.ctask.replicate.FetchAIP = fetchaip, \ org.dspace.ctask.replicate.CompareWithAIP = auditaip, \ org.dspace.ctask.replicate.RemoveAIP = removeaip, \ org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \ org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
(Only for RTS versions prior to 7.0) Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them
[dspace]/config/modules/curate.cfg
you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a samplecurate.cfg
file provided in https://github.com/DSpace/dspace-replicate/tree/master/config/modules which provides example settings, though they are all commented out by default).Enable the Replication Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTaskui.tasknames = \ ... (YOUR EXISTING TASK TASKSNAMES) ... , \ estaipsize = Estimate Storage Space for AIP(s), \ org.dspace.ctask.replicate.EstimateAIPSizereadodometer = Read estaipsizeOdometer, \ org.dspace.ctask.replicate.ReadOdometertransmitaip = readodometerTransmit AIP(s) to Storage, \ org.dspace.ctask.replicate.TransmitAIPverifyaip = transmitaip Verify AIP(s) exist in Storage, \ org.dspace.ctask.replicate.VerifyAIPfetchaip = verifyaip Fetch AIP(s) from Storage, \ org.dspace.ctask.replicate.FetchAIPauditaip = fetchaipAudit/Compare against AIP(s), \ org.dspace.ctask.replicate.CompareWithAIP = auditaipremoveaip = Remove AIP(s) from Storage, \ org.dspace.ctask.replicate.RemoveAIP = removeaiprestorefromaip = Restore Missing Object(s) from AIP(s), \ org.dspace.ctask.replicate.BagItRestoreFromAIPreplacewithaip = restorefromaip,Replace \ org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Existing Object(s) with AIP(s)
(Only for RTS versions prior to 7.0) Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups
Code Block ui.tasknames = \ general ... (YOUR EXISTING TASK NAMES) ... = General Purpose Tasks, \ replicate estaipsize = EstimateReplication Storage Space for AIP(s), \ readodometer = Read Odometer, \ transmitaip = Transmit AIP(s) to Storage, \ verifyaip = Verify AIP(s) exist in Storage, \ fetchaip = Fetch AIP(s) from Storage, \ auditaip = Audit/Compare against AIP(s), \ removeaip = Remove AIP(s) from Storage, \ restorefromaip = Restore Missing Object(s) from AIP(s), \ replacewithaip = Replace Existing Object(s) with AIP(s)
Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups = \ general = General Purpose Tasks, \ replicate = Replication Suite Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
Replication Suite Configuration: Next, in your
[dspace]/config/modules/replicate.cfg
you will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled: # Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtypeCode Block Suite Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
Replication Suite Configuration: Next, in your
[dspace]/config/modules/replicate.cfg
you will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:Code Block # Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtype = bagit
BagIt Configuration: Finally, in
[dspace]/config/modules/replicate-bagit.cfg,
you will need to configure settings for the BagIt tasks:Configure the BagIt Profile: Set the BagIt Profile which will be used
Code Block # The Bag Profile setting allows you to select a BagProfile which the RTS # will create and read bags for. The RTS will check the conformance of a # bag to a profile as part of both the packaging and restoration processes. # # See: https://github.com/duraspace/bagit-support/ for more information # # Available Options: aptrust, beyondtherepository # Default: beyondtherepository replicate-bagit.profile = beyondtherepository
Configure the Bag Metadata: Under the
replicate-bagit.tag
, set appropriate values for additional bag metadata to be packaged with your DSpace AIPs. Each configuration property of this section follows the format ofreplicate-bagit.tag.tag-filename.metadata-key: metadata-value
. See section 2.2.2 of the BagIt specification for more information on bag metadata.
Note: depending on the BagIt Profile specified there will be different required fields for the bag metadata files, so it is important to know what profile you're working with.Code Block #### BagIt Bag Metadata Settings #### # These settings allow you to customize the bag-info.txt which # is written by the BagIt packaging tools. By default no fields # are used which will produce Bags which do not conform to any # BagProfiles. replicate-bagit.tag.bag-info.source-organization = dspace replicate-bagit.tag.bag-info.organization-address = localhost
Storage Options
Where your AIPs will be stored is the next decision to make. There are three options currently available:
- Local Storage: Replicate/Backup content to another location (folder) on your local filesystem.
- Mountable Storage: Replicate/Backup content to a mounted external filesystem (e.g. NFS-mounted drive).
- DuraCloud Storage: Replicate/Backup content to an existing DuraCloud account.
...
Info |
---|
The local storage option may also be used for a mounted drive / SAN which just appears as though it is a local filesystem folder. However, some mounted drives (e.g. NFS-mounted drives) may need to use the Mountable Storage option instead. |
Before configuring a local storage option, please ensure you have enough space available on your local hard drive (or mounted drive/SAN if your local folder is actually remote storage). You can use the "Estimate Storage Space for AIP(s)" (estaipsize
) task to estimate the amount of new storage space you will need.
...
- Before you can use the DuraCloud Storage plugin, you first must signup for a DuraCloud account (or signup for a trial account).
- Once you have a DuraCloud account, you can configure the Replication Task Suite to use your 81953514 DuraCloud Account Settings (as detailed below).
- In DuraCloud, you will also want to create one (or more) "DuraCloud Spaces" in which to store your DSpace AIPs. You'll then need to configure those space(s) in the 81953514 DuraCloud Storage Settings of the Replication Task Suite (as detailed below). The DuraCloud Space represents the location in your DuraCloud account where you want to DSpace to store its content. Having a separate DuraCloud Space for your DSpace content is recommended (though not required), as it allows you to separate your DSpace content from any other content you may wish to store in DuraCloud.
...
- For each DSpace object (Community, Collection, Item), an AIP zip file is generated on the server running DSpace. The AIP is temporarily stored in the server's
[dspace]/replicate/[group.aip.name]
directory, where "[group.aip.name]
" is the value of the "group.aip.name" setting in your "replicate.cfg" configuration file (see 81953514 DuraCloud Storage Settings below for more info). This "group.aip.name" setting also corresponds to the ID of the DuraCloud Space where the AIP will be stored. - Once the AIP is generated, the Replication Task Suite determines whether a file of this same name already exists in the DuraCloud Space.
- If this file does not exist in DuraCloud, the locally generated AIP is uploaded to DuraCloud.
- If a file of this name already exists, then the Replication Task Suite checks to see if it differs from the locally generated AIP. It does so by verifying the DuraCloud reported checksum with the locally generated checksum.
- If the AIP checksums differ, the locally generated AIP is uploaded to DuraCloud and it replaces the version that was previously in DuraCloud.
- If the AIP checksums are identical, then the AIP is skipped. Nothing is uploaded to DuraCloud as the files are identical. This ensures that unnecessary uploads to DuraCloud are avoided.
- Once the local copy of the AIP is no longer needed, it is removed from the server's temporary location.
- If an upload to DuraCloud occurred, the local "odometer" is incremented to ensure it always details the total amount of content that has been uploaded (see 81953514 Keeping Score section for more info on the "odometer").
...
- For each DSpace object (Community, Collection, Item), that object's AIP is downloaded from DuraCloud to the server running DSpace (the appropriate AIP is located in DuraCloud via its filename). The AIP is temporarily stored in the server's
[dspace]/replicate/[group.aip.name]
directory, where "[group.aip.name]
" is the value of the "group.aip.name" setting in your "replicate.cfg" configuration file (see 81953514 DuraCloud Storage Settings below for more info). This "group.aip.name" setting also corresponds to the ID of the DuraCloud Space where the AIP is stored. - Once the download completes, the local "odometer" is incremented to ensure it always details the total amount of content that has been downloaded (see 81953514 Keeping Score section for more info on the "odometer").
- The AIP is then "unzipped", and the DSpace object is restored/replaced as needed.
- Once the local copy of the AIP is no longer needed, it is removed from the server's temporary location.
...
Enable DuraCloud Storage Plugin: Ensure the Replication suite is setup to use the 'DuraCloudObjectStore' plugin
Code Block # Replica store implementation class (specify one) plugin.single.org.dspace.ctask.replicate.ObjectStore = \ org.dspace.ctask.replicate.store.DuraCloudObjectStore
Configure DuraCloud Primary Space to use: Your DuraCloud account allows you to separate content into various "Spaces". You'll need to create a new DuraCloud Space that your AIPs will be stored within, and configure that as your
group.aip.name
(by default it's set to a DuraCloud Space with ID of "aip-store").Code Block # The primary storage group / folder where AIPs are stored/retrieved when AIP based tasks # are executed (e.g. "Transmit AIP", "Restore from AIP") group.aip.name = aip-store
Optionally, Configure Additional DuraCloud Spaces: If you have chosen to utilize Checkm manifest validation, you will need to create and configure a DuraCloud Space corresponding to the
group.manifest.name
setting below. Additionally, if you have chosen to enable the Automatic Replication, you will need to create and configure a DuraCloud Space corresponding to thegroup.delete.name
setting below.Code Block # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest # based tasks are executed (org.dspace.ctask.replicate.checkm.*). group.manifest.name = manifest-store # The storage group / folder where deletion records are kept when an object deletion occurs # and the ReplicationConsumer is enabled (see below). Each time an object is deleted in DSpace, # a DELETION-RECORD@[handle] file is written to this location. The deletion record is always in # BagIt format. It details basic info about the deleted object (along with any deleted child/member objects) # This deletion record may be used to restore those deleted object(s) at a later time (using "Restore from AIP" tasks), # or may be used to permanently remove their AIP(s) from storage (using "Remove AIP" task). group.delete.name = deletions
Info title Using File Prefixes instead of separate DuraCloud Spaces If you'd rather keep all your DSpace files in a single DuraCloud Space, you can tweak your "group.aip.name", "group.manifest.name" and "group.delete.name" settings to specify a file-prefix to use. For example:
group.aip.name = dspace-backup/aip-store
group.manifest.name = dspace-backup/manifest-store
group.delete.name = dspace-backup/deletions
With the above settings in place, all your DSpace content will be stored in the "dspace-backup" Space within DuraCloud. AIPs will all be stored with a file-prefix of "aip-store/" (e.g. "aip-store/ITEM@123456789-2.zip"). Manifests will all be stored with a file-prefix of "manifest-store/". And any object deletion records will be stored with a file-prefix of "deletions/". This allows you to keep all your content in a single DuraCloud Space while avoiding name conflicts between AIPs, Manifests and deletion records.
...
The Replication Task Suite offers several options to automate replication of content to your backup storage location of choice.
- 81953514 Automatically Sync Changes (via Queue) : Any changes that happen in DSpace (new objects, changed objects, deleted objects) are automatically added to a "queue". This queue can then be processed on a schedule (via cron).
- 81953514 Scheduled Site Auditing/Replication : You may also wish to perform a full site audit or backup on a scheduled basis.
...
METS-based AIP Replicate Consumer: This consumer will listen for changes to any DSpace Communities, Collections, Items, Groups, or EPeople. It should be utilized if you have chosen to use METS-based AIPs. See 81953514 AIP Format Options above for more details.
Code Block #### Event System Configuration #### # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer) event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate .... # Configure consumer to manage METS AIP content replication event.consumer.replicate.class = org.dspace.ctask.replicate.METSReplicateConsumer event.consumer.replicate.filters = Community|Collection|Item|Group|EPerson+All
- In human terms, this configuration essentially means: listen for all changes to Communities, Collections, Items, Groups and EPeople. If a change is detected, run the "METSReplicateConsumer" (which adds that object to the queue).
BagIt-based AIP Consumer : This consumer will ONLY listen for changes to DSpace Communities, Collections and Items as those are the only types of objects which are stored in BagIt-based AIPs. See 81953514 AIP Format Options above for more details
Code Block #### Event System Configuration #### # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer) event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate .... # Configure consumer to manage BagIt AIP content replication event.consumer.replicate.class = org.dspace.ctask.replicate.BagItReplicateConsumer event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete
In human terms, this configuration essentially means: listen for any new, modified or deleted Items, Collections and Communities. If you do not care about Community or Collection AIPs, just remove 'Community' or 'Collection' from the list. When one of the specified changes is detected, run the "BagItReplicateConsumer" (which adds that object to the queue).
...
- Both "add" and "modification" events add the "transmitsingleaip" task (which will regenerate & transmit the object AIP to replica storage) to the queue of tasks to perform. Please ensure you are scheduling this queue to be processed, as detailed in 81953514 Processing the Consumer Queue below.
- The "delete" event triggers a special "catalog" task. This "catalog" task does the following:
- First, it creates a plaintext "catalog" file which lists all the objects that were deleted.
- Second, it moves the AIPs for those deleted objects to the "group.delete.name" storage area (this is essentially putting them in a "trash" folder, where they can be cleaned up later, or potentially restored if the deletion was accidental).
- By default, the queue used for all replication events is located at :
[dspace]/ctqueues/replication
(this is a plaintext file which just lists all actions that should be performed the next time the queue is processed)
...
Warning | ||
---|---|---|
| ||
By default, just configuring the Consumer will only generate a queue of tasks in the location specified by the |
Processing the Sync Consumer Queue
...
Note | ||
---|---|---|
| ||
Even if you are processing the "sync queue" on a daily or weekly basis, you still may want to perform a full site-wide audit and/or backup on a less frequent basis. For example, if you are processing the sync queue on a daily basis, you might want to perform a weekly or monthly site audit/backup. Although this full site audit/backup is not required, it helps to ensure that all of your AIPs are simultaneously update-to-date at a given point in time. It's worth noting that only AIPs that have changed (i.e. have a different checksum) will be transferred to your backup location. So, if all AIPs are already up-to-date in your backup location, no AIPs would even be transferred. More information on performing such an "audit" or full-site backup (including cron job examples) can be found in the section on Scheduled Site Auditing / Replication |
Enhancing the Performance of the Queue Processing (optional)
...
In DSpace, by default, duplicate tasks in a Curation System queue will each be processed individually. So, that means if an Item is updated 10 times, it will appear in the queue 10 times, and its AIP will be (re-)generated and (re-)transmitted to storage 10 times when that queue is processed. (DuraCloud Note: Some storage platforms, e.g. DuraCloud, provide a way to determine whether a newly generated AIP actually differs from the one in replica storage. So, in the case of DuraCloud storage, the AIP will be re-generated 10 times, but it will only be transmitted to DuraCloud ONCE. The other 9 times, the DuraCloud storage plugin will determine that the checksum of the new AIP is identical to the one in DuraCloud and skip the transmission step. See 81953514 How DuraCloud storage works section above for more info.)
...
More information about each of these storage options (and how to configure them) is available in the 81953514 Storage Options configuration section above.
...
- Download the Replication Suite code from GitHub: https://github.com/DSpace/dspace-replicate
Checkout the branch you wish to develop against. For example, to checkout the 1.x branch of the codebase:
Code Block git checkout dspace-replicate-1.x
Build/Compile the Replication Suite, by running the following from the root directory
Code Block mvn package
- The code will be compiled into a JAR and all its dependencies will also be copied to your "target" directory
- The main dspace-replicate.jar will be compiled to:
[dspace-replicate]/target/dspace-replicate-[version].jar
(The Replication Suite Plugin)
- There will also be a total of 4 dependency JARs that will be copied to:
[dspace-replicate]/target/lib/common-[version].jar
(DuraCloud common libraries - required for DuraCloud integration)[dspace-replicate]/target/lib/commons-compress-[version].jar
(Apache Commons Compress - prerequisite for Replication Suite plugin)[dspace-replicate]/target/lib/storageprovider-[version].jar
(DuraCloud storage provider libraries - required for DuraCloud integration)[dspace-replicate]/target/lib/storeclient-[version].jar
(DuraCloud store client libraries - required for DuraCloud integration)
- The main dspace-replicate.jar will be compiled to:
- Once the codebase is compiled, you can install it by following the 81953514 Installation instructions above.
- Alternatively, you may temporarily copy all 5 JARs (dspace-replicate + dependency JARs) to the following locations for testing purposes only:
- DSpace "lib" folder (e.g.
[dspace]/lib/
) - This will make the Replication Task Suite available via the commandline - DSpace XMLUI "lib" folder (e.g.
[dspace]/webapps/xmlui/WEB-INF/lib/
) - This will make the Replication Task Suite available via the XMLUI.
- DSpace "lib" folder (e.g.
- You will also need to follow the 81953514 Configuration instructions above in order to properly enable & configure the Replication Task Suite.
- Alternatively, you may temporarily copy all 5 JARs (dspace-replicate + dependency JARs) to the following locations for testing purposes only: