Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Replication Task Suite is a DSpace 1.8 Add-On which provides a set of curation system tasks to assist in performing replication (backup/restore/audit) of DSpace contents to other locations. The DSpace content is packaged in containers known as AIPs (OAIS speak: 'archival information packages'). By default, AIPs are generated in the default DSpace AIP Format (the same format used by the AIP Backup and Restore tool). If desired, there is an option to generate BagIt-based AIPs instead of using the default DSpace AIP format.

This Add-On integrates DSpace 1.8 with DuraCloud for users that wish to easily back up their content into DuraCloud directly from their DSpace administrative interface.

Info
titleEarly Access Release Available

An "Early Access" release of the Replication Task Suite is available to install via:

...

Info
titleNew Development has moved to GitHub

New Development of the Replication Task Suite has been moved to GitHub: https://github.com/DSpace/dspace-replicate

The older SVN code repository still exists, but it has not been updated since the 1.0-EA (Early Access) Release.

Table of Contents
minLevel2
outlinetrue
stylenone

Prerequisites

Must be installed on a DSpace 1.8.x System

...

  1. In your DSpace Source directory ([dspace-src]), you will modify two Maven pom.xmlfiles:
    • [dspace-src]/dspace/pom.xml (This POM controls dependencies of CommandLine scripts. Modifying it will let you run dspace-replicate from commandline)
    • [dspace-src]/dspace/modules/xmlui/pom.xml (This POM controls dependencies of XMLUI. Modifying it will let you run dspace-replicate from XMLUI)
  2. For both of these pom.xml files, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies>tag):

    Code Block
    <dependency>
       <groupId>org.dspace</groupId>
       <artifactId>dspace-replicate</artifactId>
       <version>1.0-EA</version>
    </dependency>
    
  3. Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your [dspace-src]/dspace/folder:

    Code Block
    
    mvn clean package
    
  4. You will need to update your existing DSpace 1.8.x installation, by running the following from your [dspace-src]/dspace/target/dspace-1.8.x-SNAPSHOT-build/directory

    Code Block
    
    ant update
    
    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)
  5. Copy the Replication Suite's configuration files to your DSpace configuration directory
  6. Finally, follow the Configurationsettings instructions below to configure the Replication Suite based on your usage needs.

...

  1. Download the Replication Suite code
  2. Build/Compile the Replication Suite, by running the following from the root directory

    Code Block
    mvn package
  3. Copy the generated JAR files to your DSpace 1.8 installation.
    1. There are a total of 5 JARs that will need to be copied to your [dspace]/lib/
      • [dspace-replicate]/target/dspace-replicate-[version].jar (The Replication Suite Plugin)
      • [dspace-replicate]/target/lib/common-[version].jar (DuraCloud common libraries - required for DuraCloud integration)
      • [dspace-replicate]/target/lib/commons-compress-[version].jar (Apache Commons Compress - prerequisite for Replication Suite plugin)
      • [dspace-replicate]/target/lib/storageprovider-[version].jar (DuraCloud storage provider libraries - required for DuraCloud integration)
      • [dspace-replicate]/target/lib/storeclient-[version].jar (DuraCloud store client libraries - required for DuraCloud integration)
    2. Also, copy the above 5 JARs also to your XMLUI web application's WEB-INF/lib directory (e.g. [dspace]/webapps/xmlui/WEB-INF/lib/)
  4. Copy the Replication Suite's configuration files to your DSpace configuration directory
    • Replication Suite Configuration File: Copy [dspace-replicate]/config/modules/replicate.cfg to your [dspace]/config/modules/ directory
    • METS-specific AIP Configuration Settings: Copy [dspace-replicate]/config/modules/replicate-mets.cfg to your [dspace]/config/modules/ directory
    • DuraCloud Configuration File: Copy [dspace-replicate]/config/modules/duracloud.cfg to your [dspace]/config/modules/ directory
  5. Finally, follow the Configurationsettings instructions below to configure the Replication Suite based on your usage needs.
    • There is a sample curate.cfg file provided in [dspace-replicate]/config/modules/curate.cfg which can be used as a reference. It is pre-configured to use the DSpace AIP Format (METS-based packaging).

...

  1. General Curation Configuration: First, in your [dspace]/config/modules/curate.cfg you will want to enable & configure the METS-based replication tasks. (NOTE: there is a sample curate.cfg file provided in [dspace-replicate]/config/modules/curate.cfgwhich is pre-configured to use METS-based AIPs).
    • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
          org.dspace.ctask.replicate.ReadOdometer = readodometer, \
          org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
          org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
          org.dspace.ctask.replicate.FetchAIP = fetchaip, \
          org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
          org.dspace.ctask.replicate.RemoveAIP = removeaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
      
    • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          estaipsize = Estimate Storage Space for AIP(s), \
          readodometer = Read Odometer, \
          transmitaip = Transmit AIP(s) to Storage, \
          verifyaip = Verify AIP(s) exist in Storage, \
          fetchaip = Fetch AIP(s) from Storage, \
          auditaip = Audit against AIP(s), \
          removeaip = Remove AIP(s) from Storage, \
          restorefromaip = Restore Missing Object(s) from AIP(s), \
          replacewithaip = Replace Existing Object(s) with AIP(s), \
          restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\
          restoresinglefromaip = Restore Single Object from AIP, \
          replacesinglewithaip = Replace Single Object with AIP
      
    • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.*settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.

      Code Block
      
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks,
         replicate = Replication Suite Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip
      
  2. Replication Suite Configuration: Next, in your [dspace]/config/modules/replicate.cfgyou will want to ensure it is setup to properly use METS-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:

    Code Block
    
    # Package type. Permitted values: 'mets', 'bagit'
    # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
    # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
    packer.pkgtype = mets
    
    # Format of package compression. Permitted values: 'zip' or 'tgz'
    # for 'mets' packages, only 'zip' is supported
    packer.archfmt = zip
    
    # Whether or not the name packages with a DSpace type prefix.
    # When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip)
    # When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip)
    # Defaults to 'true'. For 'mets' packages, this must be 'true'.
    packer.typeprefix = true
    
  3. Optionally tweak the AIP Restore/Replace settings: Optionally, you can decide to tweak the way AIPs are restored or replaced (using AIP Backup and Restore). These settings normally should not need to be tweaked, but are available in the [dspace]/config/modules/replicate-mets.cfg configuration file. See that configuration file for more details.

...

  1. General Curation Configuration: First, in your [dspace]/config/modules/curate.cfg you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a sample curate.cfg file provided in [dspace-replicate]/config/modules/curate.cfgwhich provides example settings).
    • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
          org.dspace.ctask.replicate.ReadOdometer = readodometer, \
          org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
          org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
          org.dspace.ctask.replicate.FetchAIP = fetchaip, \
          org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
          org.dspace.ctask.replicate.RemoveAIP = removeaip, \
          org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \
          org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
      
    • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          estaipsize = Estimate Storage Space for AIP(s), \
          readodometer = Read Odometer, \
          transmitaip = Transmit AIP(s) to Storage, \
          verifyaip = Verify AIP(s) exist in Storage, \
          fetchaip = Fetch AIP(s) from Storage, \
          auditaip = Audit/Compare against AIP(s), \
          removeaip = Remove AIP(s) from Storage, \
          restorefromaip = Restore Missing Object(s) from AIP(s), \
          replacewithaip = Replace Existing Object(s) with AIP(s)
      
    • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.*settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.

      Code Block
      
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks,
         replicate = Replication Suite Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
      
  2. Replication Suite Configuration: Next, in your [dspace]/config/modules/replicate.cfgyou will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:

    Code Block
    
    # Package type. Permitted values: 'mets', 'bagit'
    # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
    # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
    packer.pkgtype = bagit
    

...

To configure local storage, please change the following settings in your [dspace]/config/modules/replicate.cfg configuration file:

  1. Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'LocalObjectStore' plugin

    Code Block
    
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.LocalObjectStore
    
  2. Configure Local Storage Folder: Configure the location where you want all AIPs to be stored on your local filestystem. This defaults to the [dspace]/repstore folder. However, we recommend changing this to at least a separate hard drive from your existing DSpace installation directory!This ensures that all your content will not be lost in the case of a hard drive failure.

    Code Block
    
    # Location of local (e.g. local, mountable, sync) object store
    # ignored for non-local stores (e.g. DuraCloud)
    store.dir = ${dspace.dir}/repstore
    
  3. Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under store.dir) which will be used to store AIPs, checkm manifests (if enabled), etc.

    Code Block
    
    # The storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # (e.g. "Transmit AIP", "Recover from AIP") are executed.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.aip.name = aips
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed
    # (org.dspace.ctask.replicate.checkm.*).
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.manifest.name = manifests
    
    # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a 
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.delete.name = deletes
    

...

To configure local storage, please change the following settings in your [dspace]/config/modules/replicate.cfg configuration file:

  1. Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'MountableObjectStore' plugin

    Code Block
    
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.MountableObjectStore
    
  2. Configure Mounted Folder: Configure the location where you want all AIPs to be stored. The folder should already be mounted on your local filesystem. This defaults to the [dspace]/repstorefolder.

    Code Block
    
    # Location of local (e.g. local, mountable, sync) object store
    # ignored for non-local stores (e.g. DuraCloud)
    store.dir = ${dspace.dir}/repstore
    
  3. Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under store.dir) which will be used to store AIPs, checkm manifests (if enabled), etc.

    Code Block
    
    # The storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # (e.g. "Transmit AIP", "Recover from AIP") are executed.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.aip.name = aips
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed
    # (org.dspace.ctask.replicate.checkm.*).
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.manifest.name = manifests
    
    # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a 
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.delete.name = deletes
    

...

In order to configure DuraCloud Storage, you first must have an existing DuraCloud Account. This account's settings should be configured in your [dspace]/config/modules/duracloud.cfg file as follows:

  1. DuraCloud HostName:This is the location of your DuraCloud instance (the URL you tend to access for your account). Just provide the hostname.

    Code Block
    
    # DuraCloud service location (just the hostname)
    host = demo.duracloud.org
    
  2. DuraCloud Service Port:This is the port that DuraCloud is running on. It is almost always "443" unless you have installed DuraCloud yourself and configured it differently.

    Code Block
    
    # DuraCloud service port (usually 443 for https)
    port = 443
    
  3. DuraCloud's "DuraStore" path:This the path to DuraCloud's "DuraStore" service. It is almost always "durastore" unless you have installed DuraCloud yourself and configured it differently.

    Code Block
    
    context = durastore
    
  4. DuraCloud Username & Password:Finally, fill out your account username & password in these settings. Please note, as this file now contains your DuraCloud account information, we recommend securing it (if possible). Just ensure it is still readable by the system user that DSpace runs as.

    Code Block
    
    # DuraCloud user name
    username = rep-agent
    # DuraCloud password
    password = passw0rd
    

...

Now, to configure DuraCloud as your storage location please change the following settings in your [dspace]/config/modules/replicate.cfg configuration file:

  1. Enable DuraCloud Storage Plugin: Ensure the Replication suite is setup to use the 'DuraCloudObjectStore' plugin

    Code Block
    
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.DuraCloudObjectStore
    
  2. Configure DuraCloud Primary Space to use: Your DuraCloud account allows you to separate content into various "Spaces". You'll need to create a new DuraCloud Space that your AIPs will be stored within, and configure that as your group.aip.name (by default it's set to a DuraCloud Space with ID of "aips"). You should also create a new DuraCloud Space that your AIPs will be moved to if they are ever removed, and configure that as your group.delete.name. Optionally, if you are using Checkm manifests, you can also create and configure a group.manifest.nameDuraCloud Space

    Code Block
    
    # The storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # (e.g. "Transmit AIP", "Recover from AIP") are executed.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.aip.name = aips
    
  3. Optionally, Configure Additional DuraCloud Spaces: If you have chosen to utilize Checkm manifest validation, you will need to create and configure a DuraCloud Space corresponding to the group.manifest.name setting below. Additionally, if you have chosen to enable the Automatic Replication, you will need to create and configure a DuraCloud Space corresponding to the group.delete.namesetting below.

    Code Block
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed
    # (org.dspace.ctask.replicate.checkm.*).
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.manifest.name = manifests
    
    # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a 
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.delete.name = deletes
    

...

  1. General Curation Configuration: First, in your [dspace]/config/modules/curate.cfg you will want to enable & configure the Checkm Manifest tasks. (NOTE: there is a sample curate.cfg file provided in [dspace-replicate]/config/modules/curate.cfgwhich provides example settings).
    • Enable the Checkm Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.checkm.TransmitManifest = transmitmanifest, \
          org.dspace.ctask.replicate.checkm.VerifyManifest = verifymanifest, \
          org.dspace.ctask.replicate.checkm.FetchManifest = fetchmanifest, \
          org.dspace.ctask.replicate.checkm.CompareWithManifest = auditmanifest, \
          org.dspace.ctask.replicate.checkm.RemoveManifest = removemanifest
      
    • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          transmitmanifest = Transmit Checkm Manifest to Storage, \
          verifymanifest = Verify Checkm Manifest exists in Storage, \
          fetchmanifest = Fetch Checkm Manifest from Storage, \
          auditmanifest = Audit against Checkm Manifest, \
          removemanifest = Remove Checkm Manifest from Storage
      
    • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "checkm" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.*settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Checkm Validation Tasks" group for all these new Replication tasks.

      Code Block
      
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks,
         checkm = Checkm Validation Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.checkm = transmitmanifest, verifymanifest, fetchmanifest, auditmanifest, removemanifest
      

...

To install this task, edit [dspace]/config/modules/curate.cfg (NB: all curation configuration is 'modular' in the sense that the configuration properties live outside of dspace.cfg, in named files. This means that if a given suite of tasks is unused, it's configuration is never installed). First, add the task to the lists of curation tasks.

Code Block

plugin.named.org.dspace.curate.CurationTask = \
.... other curation tasks
    org.dspace.ctask.replicate.EstimateAIPSize = estaipsize

Next, in the same file, add this task to the list that appears in the administrative UI:

Code Block

ui.tasknames = \
.... other tasks
    estaipsize = Estimate Storage Space for AIP(s)

...

Since we are now working with AIPs, we should examine how they are configured to the tasks. Most configuration specific to the replication task suite is found at [dspace]/config/modules/replicate.cfg. There are two main properties to set (or accept default values):

Code Block

# Package type. Permitted values: 'mets', 'bagit'
packer.pkgtype = mets
# Format of package compression. Permitted values: 'zip' or 'tgz'
# for 'mets' packages, only zip is supported
packer.archfmt = zip

...

The replication code includes a so-called 'event consumer', that can 'listen for' any changes to objects in the repository. Event consumers are documented elsewhere, but all we need to do to activate this consumer is add it to the list of consumers (in dspace.cfg):

Code Block

#### Event System Configuration ####

# default synchronous dispatcher (same behavior as traditional DSpace)
event.dispatcher.default.class = org.dspace.event.BasicDispatcher
event.dispatcher.default.consumers = search, browse, eperson, harvester, replicate
....
# consumer to manage content replication
event.consumer.replicate.class = org.dspace.ctask.replicate.ReplicateConsumer
event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete

...

If the event is an addition of a new DSpace object (actually for Items, an 'installation' - i.e. when the item exits workflow), then a request for an AIP transmission is queued. The same occurs whenever an object has changed (so-called modify events). When an object is deleted, a 'catalog' of the deletion is transmitted to the replication service. The catalog just lists all the parts of the deletion: if an item, then just the handle of the item, if a collection, then all the item handles that were in it. This way, if the deletion was mistaken, the catalog can be used to recover all the contents. This represents the default behavior of the consumer. You may configure it in /dspace/modules/replicate.cfg:

Code Block

###  ReplicateConsumer settings ###
# ReplicateConsumer must be properly declared/configured in dspace.cfg
# All tasks defined will be queued, unless the '+p' suffix is appended, when
# they will be immediately performed. Exercise considerable caution when using
# +p, as lengthy tasks can adversely affect UI or other responsiveness.

# Replicate event consumer tasks upon install/add events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.add = transmitaip

# Replicate event consumer tasks upon modification events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.mod = transmitaip

# Replicate event consumer tasks upon a delete/remove events.
# A comma separated list of valid task plugin names (with optional '+p' suffix)
consumer.tasks.del = catalog+p

# Replicate event consumer queue name - where all queued tasks are placed
consumer.queue = replication

Using the event consumer, the curator can essentially operate replication in 'auto-pilot' after the first complete transmission of AIPs.
One important configuration to be aware of is this: by default, the consumer will process all events it receives - regardless of collection. But in our current case, we intend for only the 'Amazing Images' collection to be replicated. To effect this, we must create a file in the directory defined by the /dspace/config/modules/replicate.cfg property:

Code Block

# Base directory for replication operations
base.dir = ${dspace.dir}/replicate

...

For the replication of AIPs to be of any significant value, they must be stored in a safe, persistent, reliable, accessible, and available location. The replication tasks of transmitting, fetching, etc all rely on the storage provider configured. This and related properties are found in replicate.cfg:

Code Block

# Replica store implementation class
plugin.single.org.dspace.ctask.replicate.ObjectStore = \
    org.dspace.ctask.replicate.store.LocalObjectStore

# Location of local (e.g. local, mountable, sync) object store
# ignored for non-local stores (e.g. DuraCloud)
store.dir = ${dspace.dir}/repstore

...