Page History
...
The Replication Task Suite is a DSpace 1.8 Add-On which provides a set of curation system tasks to assist in performing replication (backup/restore/audit) of DSpace contents to other locations. The DSpace content is packaged in containers known as AIPs (OAIS speak: 'archival information packages'). By default, AIPs are generated in the default DSpace AIP Format (the same format used by the AIP Backup and Restore tool). If desired, there is an option to generate BagIt-based AIPs instead of using the default DSpace AIP format.
This Add-On integrates DSpace 1.8 with DuraCloud for users that wish to easily back up their content into DuraCloud directly from their DSpace administrative interface.
Info | ||
---|---|---|
| ||
An "Early Access" release of the Replication Task Suite is available to install via:
|
...
Info | ||
---|---|---|
| ||
New Development of the Replication Task Suite has been moved to GitHub: https://github.com/DSpace/dspace-replicate The older SVN code repository still exists, but it has not been updated since the 1.0-EA (Early Access) Release. |
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
Prerequisites
Must be installed on a DSpace 1.8.x System
...
- In your DSpace Source directory (
[dspace-src]
), you will modify two Mavenpom.xml
files:[dspace-src]/dspace/pom.xml
(This POM controls dependencies of CommandLine scripts. Modifying it will let you rundspace-replicate
from commandline)[dspace-src]/dspace/modules/xmlui/pom.xml
(This POM controls dependencies of XMLUI. Modifying it will let you rundspace-replicate
from XMLUI)
For both of these pom.xml files, add the following
<dependency>
section at the end of the existing<dependencies>
section (just before the closing</dependencies>
tag):Code Block <dependency> <groupId>org.dspace</groupId> <artifactId>dspace-replicate</artifactId> <version>1.0-EA</version> </dependency>
Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your
[dspace-src]/dspace/
folder:Code Block mvn clean package
You will need to update your existing DSpace 1.8.x installation, by running the following from your
[dspace-src]/dspace/target/dspace-1.8.x-SNAPSHOT-build/
directoryCode Block ant update
Note Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:
ant update_code
(Updates the existing [dspace]/lib/ directory)ant update_webapps
(Updates the existing [dspace]/webapp/ directory)
- Copy the Replication Suite's configuration files to your DSpace configuration directory
- Replication Suite Configuration File: Copy [dspace-replicate]/config/modules/replicate.cfg to your
[dspace]/config/modules/
directory - METS-specific AIP Configuration Settings: Copy [dspace-replicate]/config/modules/replicate-mets.cfg to your
[dspace]/config/modules/
directory - DuraCloud Configuration File: Copy [dspace-replicate]/config/modules/duracloud.cfg to your
[dspace]/config/modules/
directory
- Replication Suite Configuration File: Copy [dspace-replicate]/config/modules/replicate.cfg to your
- Finally, follow the Configurationsettings instructions below to configure the Replication Suite based on your usage needs.
- There is a sample
curate.cfg
file provided in [dspace-replicate]/config/modules/curate.cfg which can be used as a reference. It is pre-configured to use the DSpace AIP Format (METS-based packaging).
- There is a sample
...
- Download the Replication Suite code
- Downloadable Zip: https://github.com/DSpace/dspace-replicate/tags
- Download "Early Access" release via SVN: http://scm.dspace.org/svn/repo/modules/dspace-replicate/tags/dspace-replicate-1.0-EA/
- Experimental Trunk code is available via GitHub at: https://github.com/DSpace/dspace-replicate
Build/Compile the Replication Suite, by running the following from the root directory
Code Block mvn package
- Copy the generated JAR files to your DSpace 1.8 installation.
- There are a total of 5 JARs that will need to be copied to your
[dspace]/lib/
[dspace-replicate]/target/dspace-replicate-[version].jar
(The Replication Suite Plugin)[dspace-replicate]/target/lib/common-[version].jar
(DuraCloud common libraries - required for DuraCloud integration)[dspace-replicate]/target/lib/commons-compress-[version].jar
(Apache Commons Compress - prerequisite for Replication Suite plugin)[dspace-replicate]/target/lib/storageprovider-[version].jar
(DuraCloud storage provider libraries - required for DuraCloud integration)[dspace-replicate]/target/lib/storeclient-[version].jar
(DuraCloud store client libraries - required for DuraCloud integration)
- Also, copy the above 5 JARs also to your XMLUI web application's WEB-INF/lib directory (e.g.
[dspace]/webapps/xmlui/WEB-INF/lib/
)
- There are a total of 5 JARs that will need to be copied to your
- Copy the Replication Suite's configuration files to your DSpace configuration directory
- Replication Suite Configuration File: Copy
[dspace-replicate]/config/modules/replicate.cfg
to your[dspace]/config/modules/
directory - METS-specific AIP Configuration Settings: Copy
[dspace-replicate]/config/modules/replicate-mets.cfg
to your[dspace]/config/modules/
directory - DuraCloud Configuration File: Copy
[dspace-replicate]/config/modules/duracloud.cfg
to your[dspace]/config/modules/
directory
- Replication Suite Configuration File: Copy
- Finally, follow the Configurationsettings instructions below to configure the Replication Suite based on your usage needs.
- There is a sample
curate.cfg
file provided in[dspace-replicate]/config/modules/curate.cfg
which can be used as a reference. It is pre-configured to use the DSpace AIP Format (METS-based packaging).
- There is a sample
...
- General Curation Configuration: First, in your
[dspace]/config/modules/curate.cfg
you will want to enable & configure the METS-based replication tasks. (NOTE: there is a samplecurate.cfg
file provided in[dspace-replicate]/config/modules/curate.cfg
which is pre-configured to use METS-based AIPs).Enable the Replication Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTask = \ ... (YOUR EXISTING TASKS) ... , \ org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \ org.dspace.ctask.replicate.ReadOdometer = readodometer, \ org.dspace.ctask.replicate.TransmitAIP = transmitaip, \ org.dspace.ctask.replicate.VerifyAIP = verifyaip, \ org.dspace.ctask.replicate.FetchAIP = fetchaip, \ org.dspace.ctask.replicate.CompareWithAIP = auditaip, \ org.dspace.ctask.replicate.RemoveAIP = removeaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \ org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \ org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block ui.tasknames = \ ... (YOUR EXISTING TASK NAMES) ... , \ estaipsize = Estimate Storage Space for AIP(s), \ readodometer = Read Odometer, \ transmitaip = Transmit AIP(s) to Storage, \ verifyaip = Verify AIP(s) exist in Storage, \ fetchaip = Fetch AIP(s) from Storage, \ auditaip = Audit against AIP(s), \ removeaip = Remove AIP(s) from Storage, \ restorefromaip = Restore Missing Object(s) from AIP(s), \ replacewithaip = Replace Existing Object(s) with AIP(s), \ restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\ restoresinglefromaip = Restore Single Object from AIP, \ replacesinglewithaip = Replace Single Object with AIP
Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups = \ general = General Purpose Tasks, replicate = Replication Suite Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip
Replication Suite Configuration: Next, in your
[dspace]/config/modules/replicate.cfg
you will want to ensure it is setup to properly use METS-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:Code Block # Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtype = mets # Format of package compression. Permitted values: 'zip' or 'tgz' # for 'mets' packages, only 'zip' is supported packer.archfmt = zip # Whether or not the name packages with a DSpace type prefix. # When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip) # When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip) # Defaults to 'true'. For 'mets' packages, this must be 'true'. packer.typeprefix = true
- Optionally tweak the AIP Restore/Replace settings: Optionally, you can decide to tweak the way AIPs are restored or replaced (using AIP Backup and Restore). These settings normally should not need to be tweaked, but are available in the
[dspace]/config/modules/replicate-mets.cfg
configuration file. See that configuration file for more details.
...
- General Curation Configuration: First, in your
[dspace]/config/modules/curate.cfg
you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a samplecurate.cfg
file provided in[dspace-replicate]/config/modules/curate.cfg
which provides example settings).Enable the Replication Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTask = \ ... (YOUR EXISTING TASKS) ... , \ org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \ org.dspace.ctask.replicate.ReadOdometer = readodometer, \ org.dspace.ctask.replicate.TransmitAIP = transmitaip, \ org.dspace.ctask.replicate.VerifyAIP = verifyaip, \ org.dspace.ctask.replicate.FetchAIP = fetchaip, \ org.dspace.ctask.replicate.CompareWithAIP = auditaip, \ org.dspace.ctask.replicate.RemoveAIP = removeaip, \ org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \ org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block ui.tasknames = \ ... (YOUR EXISTING TASK NAMES) ... , \ estaipsize = Estimate Storage Space for AIP(s), \ readodometer = Read Odometer, \ transmitaip = Transmit AIP(s) to Storage, \ verifyaip = Verify AIP(s) exist in Storage, \ fetchaip = Fetch AIP(s) from Storage, \ auditaip = Audit/Compare against AIP(s), \ removeaip = Remove AIP(s) from Storage, \ restorefromaip = Restore Missing Object(s) from AIP(s), \ replacewithaip = Replace Existing Object(s) with AIP(s)
Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups = \ general = General Purpose Tasks, replicate = Replication Suite Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
Replication Suite Configuration: Next, in your
[dspace]/config/modules/replicate.cfg
you will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:Code Block # Package type. Permitted values: 'mets', 'bagit' # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt packer.pkgtype = bagit
...
To configure local storage, please change the following settings in your [dspace]/config/modules/replicate.cfg
configuration file:
Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'LocalObjectStore' plugin
Code Block # Replica store implementation class (specify one) plugin.single.org.dspace.ctask.replicate.ObjectStore = \ org.dspace.ctask.replicate.store.LocalObjectStore
Configure Local Storage Folder: Configure the location where you want all AIPs to be stored on your local filestystem. This defaults to the
[dspace]/repstore
folder. However, we recommend changing this to at least a separate hard drive from your existing DSpace installation directory!This ensures that all your content will not be lost in the case of a hard drive failure.Code Block # Location of local (e.g. local, mountable, sync) object store # ignored for non-local stores (e.g. DuraCloud) store.dir = ${dspace.dir}/repstore
Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under
store.dir
) which will be used to store AIPs, checkm manifests (if enabled), etc.Code Block # The storage group / folder where AIPs are stored/retrieved when AIP based tasks # (e.g. "Transmit AIP", "Recover from AIP") are executed. # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.aip.name = aips # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed # (org.dspace.ctask.replicate.checkm.*). # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.manifest.name = manifests # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored. # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible. # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.delete.name = deletes
...
To configure local storage, please change the following settings in your [dspace]/config/modules/replicate.cfg
configuration file:
Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'MountableObjectStore' plugin
Code Block # Replica store implementation class (specify one) plugin.single.org.dspace.ctask.replicate.ObjectStore = \ org.dspace.ctask.replicate.store.MountableObjectStore
Configure Mounted Folder: Configure the location where you want all AIPs to be stored. The folder should already be mounted on your local filesystem. This defaults to the
[dspace]/repstore
folder.Code Block # Location of local (e.g. local, mountable, sync) object store # ignored for non-local stores (e.g. DuraCloud) store.dir = ${dspace.dir}/repstore
Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under
store.dir
) which will be used to store AIPs, checkm manifests (if enabled), etc.Code Block # The storage group / folder where AIPs are stored/retrieved when AIP based tasks # (e.g. "Transmit AIP", "Recover from AIP") are executed. # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.aip.name = aips # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed # (org.dspace.ctask.replicate.checkm.*). # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.manifest.name = manifests # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored. # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible. # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.delete.name = deletes
...
In order to configure DuraCloud Storage, you first must have an existing DuraCloud Account. This account's settings should be configured in your [dspace]/config/modules/duracloud.cfg
file as follows:
DuraCloud HostName:This is the location of your DuraCloud instance (the URL you tend to access for your account). Just provide the hostname.
Code Block # DuraCloud service location (just the hostname) host = demo.duracloud.org
DuraCloud Service Port:This is the port that DuraCloud is running on. It is almost always "443" unless you have installed DuraCloud yourself and configured it differently.
Code Block # DuraCloud service port (usually 443 for https) port = 443
DuraCloud's "DuraStore" path:This the path to DuraCloud's "DuraStore" service. It is almost always "durastore" unless you have installed DuraCloud yourself and configured it differently.
Code Block context = durastore
DuraCloud Username & Password:Finally, fill out your account username & password in these settings. Please note, as this file now contains your DuraCloud account information, we recommend securing it (if possible). Just ensure it is still readable by the system user that DSpace runs as.
Code Block # DuraCloud user name username = rep-agent # DuraCloud password password = passw0rd
...
Now, to configure DuraCloud as your storage location please change the following settings in your [dspace]/config/modules/replicate.cfg
configuration file:
Enable DuraCloud Storage Plugin: Ensure the Replication suite is setup to use the 'DuraCloudObjectStore' plugin
Code Block # Replica store implementation class (specify one) plugin.single.org.dspace.ctask.replicate.ObjectStore = \ org.dspace.ctask.replicate.store.DuraCloudObjectStore
Configure DuraCloud Primary Space to use: Your DuraCloud account allows you to separate content into various "Spaces". You'll need to create a new DuraCloud Space that your AIPs will be stored within, and configure that as your
group.aip.name
(by default it's set to a DuraCloud Space with ID of "aips"). You should also create a new DuraCloud Space that your AIPs will be moved to if they are ever removed, and configure that as yourgroup.delete.name
. Optionally, if you are using Checkm manifests, you can also create and configure agroup.manifest.name
DuraCloud SpaceCode Block # The storage group / folder where AIPs are stored/retrieved when AIP based tasks # (e.g. "Transmit AIP", "Recover from AIP") are executed. # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.aip.name = aips
Optionally, Configure Additional DuraCloud Spaces: If you have chosen to utilize Checkm manifest validation, you will need to create and configure a DuraCloud Space corresponding to the
group.manifest.name
setting below. Additionally, if you have chosen to enable the Automatic Replication, you will need to create and configure a DuraCloud Space corresponding to thegroup.delete.name
setting below.Code Block # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed # (org.dspace.ctask.replicate.checkm.*). # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.manifest.name = manifests # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored. # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible. # For Local object stores, this group name corresponds to a subfolder in the 'store.dir' # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist) group.delete.name = deletes
...
- General Curation Configuration: First, in your
[dspace]/config/modules/curate.cfg
you will want to enable & configure the Checkm Manifest tasks. (NOTE: there is a samplecurate.cfg
file provided in[dspace-replicate]/config/modules/curate.cfg
which provides example settings).Enable the Checkm Tasks: In the list of "Task Class implementations" (
plugin.named.org.dspace.curate.CurationTask
), add the following.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block plugin.named.org.dspace.curate.CurationTask = \ ... (YOUR EXISTING TASKS) ... , \ org.dspace.ctask.replicate.checkm.TransmitManifest = transmitmanifest, \ org.dspace.ctask.replicate.checkm.VerifyManifest = verifymanifest, \ org.dspace.ctask.replicate.checkm.FetchManifest = fetchmanifest, \ org.dspace.ctask.replicate.checkm.CompareWithManifest = auditmanifest, \ org.dspace.ctask.replicate.checkm.RemoveManifest = removemanifest
Give Each Task a Human-Friendly Task Name: Under the
ui.tasknames
setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
REMEMBER to add a comma and backslash (", \") after each line (except the final line).Code Block ui.tasknames = \ ... (YOUR EXISTING TASK NAMES) ... , \ transmitmanifest = Transmit Checkm Manifest to Storage, \ verifymanifest = Verify Checkm Manifest exists in Storage, \ fetchmanifest = Fetch Checkm Manifest from Storage, \ auditmanifest = Audit against Checkm Manifest, \ removemanifest = Remove Checkm Manifest from Storage
Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "checkm" and add them all to it. The below is just an example for how you may wish to set the
ui.taskgroups
andui.taskgroup.*
settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Checkm Validation Tasks" group for all these new Replication tasks.Code Block # Tasks may be organized into named groups which display together in UI drop-downs ui.taskgroups = \ general = General Purpose Tasks, checkm = Checkm Validation Tasks # Group membership is defined using comma-separated lists of task names, one property per group ui.taskgroup.general = profileformats, requiredmetadata, checklinks ui.taskgroup.checkm = transmitmanifest, verifymanifest, fetchmanifest, auditmanifest, removemanifest
...
To install this task, edit [dspace]/config/modules/curate.cfg
(NB: all curation configuration is 'modular' in the sense that the configuration properties live outside of dspace.cfg, in named files. This means that if a given suite of tasks is unused, it's configuration is never installed). First, add the task to the lists of curation tasks.
Code Block |
---|
plugin.named.org.dspace.curate.CurationTask = \
.... other curation tasks
org.dspace.ctask.replicate.EstimateAIPSize = estaipsize
|
Next, in the same file, add this task to the list that appears in the administrative UI:
Code Block |
---|
ui.tasknames = \
.... other tasks
estaipsize = Estimate Storage Space for AIP(s)
|
...
Since we are now working with AIPs, we should examine how they are configured to the tasks. Most configuration specific to the replication task suite is found at [dspace]/config/modules/replicate.cfg
. There are two main properties to set (or accept default values):
Code Block |
---|
# Package type. Permitted values: 'mets', 'bagit'
packer.pkgtype = mets
# Format of package compression. Permitted values: 'zip' or 'tgz'
# for 'mets' packages, only zip is supported
packer.archfmt = zip
|
...
The replication code includes a so-called 'event consumer', that can 'listen for' any changes to objects in the repository. Event consumers are documented elsewhere, but all we need to do to activate this consumer is add it to the list of consumers (in dspace.cfg):
Code Block |
---|
#### Event System Configuration ####
# default synchronous dispatcher (same behavior as traditional DSpace)
event.dispatcher.default.class = org.dspace.event.BasicDispatcher
event.dispatcher.default.consumers = search, browse, eperson, harvester, replicate
....
# consumer to manage content replication
event.consumer.replicate.class = org.dspace.ctask.replicate.ReplicateConsumer
event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete
|
...
If the event is an addition of a new DSpace object (actually for Items, an 'installation' - i.e. when the item exits workflow), then a request for an AIP transmission is queued. The same occurs whenever an object has changed (so-called modify events). When an object is deleted, a 'catalog' of the deletion is transmitted to the replication service. The catalog just lists all the parts of the deletion: if an item, then just the handle of the item, if a collection, then all the item handles that were in it. This way, if the deletion was mistaken, the catalog can be used to recover all the contents. This represents the default behavior of the consumer. You may configure it in /dspace/modules/replicate.cfg:
Code Block |
---|
### ReplicateConsumer settings ###
# ReplicateConsumer must be properly declared/configured in dspace.cfg
# All tasks defined will be queued, unless the '+p' suffix is appended, when
# they will be immediately performed. Exercise considerable caution when using
# +p, as lengthy tasks can adversely affect UI or other responsiveness.
# Replicate event consumer tasks upon install/add events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.add = transmitaip
# Replicate event consumer tasks upon modification events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.mod = transmitaip
# Replicate event consumer tasks upon a delete/remove events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.del = catalog+p
# Replicate event consumer queue name - where all queued tasks are placed
consumer.queue = replication
|
Using the event consumer, the curator can essentially operate replication in 'auto-pilot' after the first complete transmission of AIPs.
One important configuration to be aware of is this: by default, the consumer will process all events it receives - regardless of collection. But in our current case, we intend for only the 'Amazing Images' collection to be replicated. To effect this, we must create a file in the directory defined by the /dspace/config/modules/replicate.cfg property:
Code Block |
---|
# Base directory for replication operations
base.dir = ${dspace.dir}/replicate
|
...
For the replication of AIPs to be of any significant value, they must be stored in a safe, persistent, reliable, accessible, and available location. The replication tasks of transmitting, fetching, etc all rely on the storage provider configured. This and related properties are found in replicate.cfg:
Code Block |
---|
# Replica store implementation class
plugin.single.org.dspace.ctask.replicate.ObjectStore = \
org.dspace.ctask.replicate.store.LocalObjectStore
# Location of local (e.g. local, mountable, sync) object store
# ignored for non-local stores (e.g. DuraCloud)
store.dir = ${dspace.dir}/repstore
|
...