Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

...

Info
titleEarly Access Release Available

An "Early Access" release of the Replication Task Suite is available to install via:

Note
titleMore Information

More information on the Replication Task Suite is available from the following webinars/screencasts:

The #Problem Statement & Usage Examples section below also provides some real-life scenarios/examples of where each Replication task may come in handy.

Info
titleNew Development has moved to GitHub

New Development of the Replication Task Suite has been moved to GitHub: https://github.com/DSpace/dspace-replicateImage Removed

The older SVN code repository still exists, but it has not been updated since the 1.0-EA (Early Access) Release.

...

Warning
titleKnown Curation System bug in 1.8.0

DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug will be resolved in DSpace 1.8.1.
Because of the above bug, we recommend running the Replication Suite on DSpace 1.8.1 or above.

Developers may obtain an early version of the soon-to-be DSpace 1.8.1 release by accessing the 1.8 Bug-fix Branch in the DSpace SVN: http://scm.dspace.org/svn/repo/dspace/branches/dspace-1_8_x/Image Removed

Because of enhancements to the Curation System in DSpace 1.8.0, the Replication Suite is only compatible with a DSpace 1.8.x System.

...

Note
titleMaven-based Installation is recommended

At this time, it's recommended to install the DSpace Replication Suite via #Maven-based Installation. This form of installation will ensure that DSpace Replication Suite doesn't require re-installation during your next DSpace upgrade.

Maven-based Installation

  1. Wiki MarkupIn your DSpace Source directory ({{\[dspace-src\]}}), you will modify two Maven {{pom.xml}} files:
      unmigrated-wiki-markup
    • {{\[dspace-src\]/dspace/pom.xml}} (This POM controls dependencies of CommandLine scripts. Modifying it will let you run {{dspace-replicate}} from commandline) Wiki Markup{{\
    • [dspace-src\]/dspace/modules/xmlui/pom.xml}} (This POM controls dependencies of XMLUI. Modifying it will let you run {{dspace-replicate}} from XMLUI)
  2. For both of these pom.xml files, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag):
    Code Block
    <dependency>
       <groupId>org.dspace</groupId>
       <artifactId>dspace-replicate</artifactId>
       <version>1.0-EA</version>
    </dependency>
    
    Wiki Markup
  3. Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your {{\[dspace-src\]/dspace/}} folder:
    Code Block
    mvn clean package
    
    Wiki Markup
  4. You will need to update your existing DSpace 1.8.x installation, by running the following from your {{\[dspace-src\]/dspace/target/dspace-1.8.x-SNAPSHOT-build/}} directory
    Code Block
    ant update
    
    {{ant update_code}} (Updates the existing \ [dspace\]/lib/ directory)
  5. {{ant update_webapps}} (Updates the existing \ [dspace\]/webapp/ directory)
  6. Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    Wiki Markup
    Wiki Markup
  7. Copy the Replication Suite's configuration files to your DSpace configuration directory
    • Wiki Markup*Replication Suite Configuration File:* Copy [\[dspace-replicate\]/config/modules/replicate.cfg|http://scm.dspace.org/svn/repo to your [dspace]/config/modules/dspace-replicate/tags/dspace-replicate-1.0-EA/config/modules/replicate.cfg] to your {{\[dspace\]/config/modules/}} directory Wiki Markup*METS-specific AIP Configuration Settings:* Copy [\ directory
    • METS-specific AIP Configuration Settings: Copy [dspace-replicate\]/config/modules/replicate-mets.cfg|http://scm.dspace.org/svn/repo/modules/dspace-replicate/tags/dspace-replicate-1.0-EA/config/modules/replicate-mets.cfg] to your {{\[dspace\]/config/modules/}} directory Wiki Markup*
    • DuraCloud Configuration File:* Copy [\[dspace-replicate\]/config/modules/duracloud.cfg|http://scm.dspace.org/svn/repo/modules/dspace-replicate/tags/dspace-replicate-1.0-EA/config/modules/duracloud.cfg] to your {{\[dspace\]/config/modules/}} directory
  8. Finally, follow the Configuration settings instructions below to configure the Replication Suite based on your usage needs.

...

  1. Download the Replication Suite code
  2. Build/Compile the Replication Suite, by running the following from the root directory
    Code Block
    mvn package
  3. Copy the generated JAR files to your DSpace 1.8 installation.
      unmigrated-wiki-markup
    1. There are a total of 5 JARs that will need to be copied to your {{\[dspace\]/lib/}}lib/
      • Wiki Markup{{\[dspace-replicate\]/target/dspace-replicate-\[version\].jar}} (The Replication Suite Plugin)
      • Wiki Markup{{\[dspace-replicate\]/target/lib/common-\[version\].jar}} (DuraCloud common libraries - required for DuraCloud integration) Wiki Markup
      • {{\[dspace-replicate\]/target/lib/commons-compress-\[version\].jar}} (Apache Commons Compress - prerequisite for Replication Suite plugin)
      • Wiki Markup{{\[dspace-replicate\]/target/lib/storageprovider-\[version\].jar}} (DuraCloud storage provider libraries - required for DuraCloud integration)unmigrated-wiki-markup{{\
      • [dspace-replicate\]/target/lib/storeclient-\[version\].jar}} (DuraCloud store client libraries - required for DuraCloud integration)
      unmigrated-wiki-markup
    2. Also, copy the above 5 JARs also to your XMLUI web application's WEB-INF/lib directory (e.g. {{\[dspace\]/webapps/xmlui/WEB-INF/lib/}})
  4. Copy the Replication Suite's configuration files to your DSpace configuration directory
    • Wiki Markup*Replication Suite Configuration File:* Copy {{\[dspace-replicate\]/config/modules/replicate.cfg}} to your {{\[dspace\]/config/modules/}} directoryunmigrated-wiki-markup*
    • METS-specific AIP Configuration Settings:* Copy {{\[dspace-replicate\]/config/modules/replicate-mets.cfg}} to your {{\[dspace\]/config/modules/}} directoryunmigrated-wiki-markup*
    • DuraCloud Configuration File:* Copy {{\[dspace-replicate\]/config/modules/duracloud.cfg}} to your {{\[dspace\]/config/modules/}} directory
  5. Finally, follow the Configuration settings instructions below to configure the Replication Suite based on your usage needs.
      unmigrated-wiki-markup
    • There is a sample {{sample curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which can be used as a reference. It is pre-configured to use the [DSpace AIP Format|DSDOC18:DSpace AIP Format] (METS-based packaging).

Configuration

Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.

...

This section goes through the steps of configuring the Replication Suite to use the default DSpace AIP format, which utilizes METS packaging.

  1. Wiki Markup*General Curation Configuration:* First, in your {{\[dspace\]/config/modules/curate.cfg}} you will want to enable & configure the METS-based replication tasks. (NOTE: there is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which is pre-configured to use METS-based AIPs).
    • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).
      Code Block
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
          org.dspace.ctask.replicate.ReadOdometer = readodometer, \
          org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
          org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
          org.dspace.ctask.replicate.FetchAIP = fetchaip, \
          org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
          org.dspace.ctask.replicate.RemoveAIP = removeaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \
          org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
      
    • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).
      Code Block
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          estaipsize = Estimate Storage Space for AIP(s), \
          readodometer = Read Odometer, \
          transmitaip = Transmit AIP(s) to Storage, \
          verifyaip = Verify AIP(s) exist in Storage, \
          fetchaip = Fetch AIP(s) from Storage, \
          auditaip = Audit against AIP(s), \
          removeaip = Remove AIP(s) from Storage, \
          restorefromaip = Restore Missing Object(s) from AIP(s), \
          replacewithaip = Replace Existing Object(s) with AIP(s), \
          restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\
          restoresinglefromaip = Restore Single Object from AIP, \
          replacesinglewithaip = Replace Single Object with AIP
      
    • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.* settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.
      Code Block
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks,
         replicate = Replication Suite Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip
      
    Wiki Markup*
  2. Replication Suite Configuration*: Next, in your {{\in your [dspace\]/config/modules/replicate.cfg}} you will want to ensure it is setup to properly use METS-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled: unmigrated-wiki-markup
    Code Block
    # Package type. Permitted values: 'mets', 'bagit'
    # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
    # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
    packer.pkgtype = mets
    
    # Format of package compression. Permitted values: 'zip' or 'tgz'
    # for 'mets' packages, only 'zip' is supported
    packer.archfmt = zip
    
    # Whether or not the name packages with a DSpace type prefix.
    # When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip)
    # When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip)
    # Defaults to 'true'. For 'mets' packages, this must be 'true'.
    packer.typeprefix = true
    
    *
  3. Optionally tweak the AIP Restore/Replace settings:* Optionally, you can decide to tweak the way AIPs are restored or replaced (using [DSDOC18:AIP Backup and Restore]). These settings normally *should not need to be tweaked*, but are available in the {{\[dspace\]/config/modules/replicate-mets.cfg}} configuration file. See that configuration file for more details.

Configuring usage of DSpace BagIt AIP Format

This section goes through the steps of configuring the Replication Suite to use BagIt-based AIPs. For more information on the BagIt packaging format, see: https://wiki.ucop.edu/display/Curation/BagItImage Removed

  1. Wiki Markup*General Curation Configuration:* First, in your {{\[dspace\]/config/modules/curate.cfg}} you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which provides example settings).
    • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).
      Code Block
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
          org.dspace.ctask.replicate.ReadOdometer = readodometer, \
          org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
          org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
          org.dspace.ctask.replicate.FetchAIP = fetchaip, \
          org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
          org.dspace.ctask.replicate.RemoveAIP = removeaip, \
          org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \
          org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
      
    • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).
      Code Block
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          estaipsize = Estimate Storage Space for AIP(s), \
          readodometer = Read Odometer, \
          transmitaip = Transmit AIP(s) to Storage, \
          verifyaip = Verify AIP(s) exist in Storage, \
          fetchaip = Fetch AIP(s) from Storage, \
          auditaip = Audit/Compare against AIP(s), \
          removeaip = Remove AIP(s) from Storage, \
          restorefromaip = Restore Missing Object(s) from AIP(s), \
          replacewithaip = Replace Existing Object(s) with AIP(s)
      
    • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.* settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.
      Code Block
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks,
         replicate = Replication Suite Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
      
  2. Wiki Markup*Replication Suite Configuration*: Next, in your {{\[dspace\]/config/modules/replicate.cfg}} you will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:
    Code Block
    # Package type. Permitted values: 'mets', 'bagit'
    # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
    # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
    packer.pkgtype = bagit
    

...

Before configuring a local storage option, please ensure you have enough space available on your local hard drive (or mounted drive/SAN if your local folder is actually remote storage). You can use the "Estimate Storage Space" (estaipsize) task to estimate the amount of new storage space you will need.unmigrated-wiki-markup

To configure local storage, please change the following settings in your {{\[dspace\]/config/modules/replicate.cfg}} configuration file:

  1. Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'LocalObjectStore' plugin *
    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.LocalObjectStore
    
    Wiki Markup
  2. Configure Local Storage Folder*: Configure the location where you want all AIPs to be stored on your local filestystem. This defaults to the {{\[dspace\]/repstore}} folder. _However, we recommend changing this to at least a separate hard drive from your existing DSpace installation directory!_ This ensures that all your content will not be lost in the case of a hard drive failure.
    Code Block
    # Location of local (e.g. local, mountable, sync) object store
    # ignored for non-local stores (e.g. DuraCloud)
    store.dir = ${dspace.dir}/repstore
    
  3. Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under store.dir) which will be used to store AIPs, checkm manifests (if enabled), etc.
    Code Block
    # The storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # (e.g. "Transmit AIP", "Recover from AIP") are executed.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.aip.name = aips
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed
    # (org.dspace.ctask.replicate.checkm.*).
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.manifest.name = manifests
    
    # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a 
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.delete.name = deletes
    

...

Before configuring a mounted storage option, please ensure you have enough space available on your external, mounted drive/SAN. You can use the "Estimate Storage Space" (estaipsize) task to estimate the amount of new storage space you will need.

Wiki MarkupTo configure local storage, please change the following settings in your {{\[dspace\]/config/modules/replicate.cfg}} configuration file:

  1. Enable Local Storage Plugin: Ensure the Replication suite is setup to use the 'MountableObjectStore' plugin *
    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.MountableObjectStore
    
    Wiki Markup
  2. Configure Mounted Folder*: Configure the location where you want all AIPs to be stored. The folder should already be mounted on your local filesystem. This defaults to the {{\[dspace\]/repstore}} folder.
    Code Block
    # Location of local (e.g. local, mountable, sync) object store
    # ignored for non-local stores (e.g. DuraCloud)
    store.dir = ${dspace.dir}/repstore
    
  3. Optionally Configure Subfolder Settings: Optionally, you can configure the sub-folder names (under store.dir) which will be used to store AIPs, checkm manifests (if enabled), etc.
    Code Block
    # The storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # (e.g. "Transmit AIP", "Recover from AIP") are executed.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.aip.name = aips
    
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed
    # (org.dspace.ctask.replicate.checkm.*).
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.manifest.name = manifests
    
    # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a 
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.delete.name = deletes
    

...

DuraCloud Account Settings

Wiki MarkupIn order to configure DuraCloud Storage, you first must have an existing [DuraCloud Account|http://www.duracloud.org/]. This account's settings should be configured in your {{\[dspace\]/config/modules/duracloud.cfg}} file as follows:

  1. DuraCloud HostName: This is the location of your DuraCloud instance (the URL you tend to access for your account). Just provide the hostname.
    Code Block
    # DuraCloud service location (just the hostname)
    host = demo.duracloud.org
    
  2. DuraCloud Service Port: This is the port that DuraCloud is running on. It is almost always "443" unless you have installed DuraCloud yourself and configured it differently.
    Code Block
    # DuraCloud service port (usually 443 for https)
    port = 443
    
  3. DuraCloud's "DuraStore" path: This the path to DuraCloud's "DuraStore" service. It is almost always "durastore" unless you have installed DuraCloud yourself and configured it differently.
    Code Block
    context = durastore
    
  4. DuraCloud Username & Password: Finally, fill out your account username & password in these settings. Please note, as this file now contains your DuraCloud account information, we recommend securing it (if possible). Just ensure it is still readable by the system user that DSpace runs as.
    Code Block
    # DuraCloud user name
    username = rep-agent
    # DuraCloud password
    password = passw0rd
    
DuraCloud Storage Settings

Wiki MarkupNow, to configure DuraCloud as your storage location please change the following settings in your {{\[dspace\]/config/modules/replicate.cfg}} configuration file:

  1. Enable DuraCloud Storage Plugin: Ensure the Replication suite is setup to use the 'DuraCloudObjectStore' plugin
    Code Block
    # Replica store implementation class (specify one)
    plugin.single.org.dspace.ctask.replicate.ObjectStore = \
        org.dspace.ctask.replicate.store.DuraCloudObjectStore
    
  2. Configure DuraCloud Primary Space to use: Your DuraCloud account allows you to separate content into various "Spaces". You'll need to create a new DuraCloud Space that your AIPs will be stored within, and configure that as your group.aip.name (by default it's set to a DuraCloud Space with ID of "aips"). You should also create a new DuraCloud Space that your AIPs will be moved to if they are ever removed, and configure that as your group.delete.name. Optionally, if you are using Checkm manifests, you can also create and configure a group.manifest.name DuraCloud Space
    Code Block
    # The storage group / folder where AIPs are stored/retrieved when AIP based tasks 
    # (e.g. "Transmit AIP", "Recover from AIP") are executed.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.aip.name = aips
    
  3. Optionally, Configure Additional DuraCloud Spaces: If you have chosen to utilize Checkm manifest validation, you will need to create and configure a DuraCloud Space corresponding to the group.manifest.name setting below. Additionally, if you have chosen to enable the Automatic Replication, you will need to create and configure a DuraCloud Space corresponding to the group.delete.name setting below.
    Code Block
    # The storage group / folder where Checkm Manifests are stored/retrieved when Checkm Manifest based tasks are executed
    # (org.dspace.ctask.replicate.checkm.*).
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.manifest.name = manifests
    
    # The storage group / folder where AIPs are temporarily stored/retrieved when an object deletion occurs
    # and the ReplicationConsumer is enabled (see below). Essentially, this 'delete' group provides a 
    # location where AIPs can be temporarily kept in case the deletion needs to be reverted and the object restored.
    # WARNING: THIS MUST NOT BE SET TO THE SAME VALUE AS 'group.aip.name'. If it is set to the 
    # same value, then your AIP backup processes will be UNSTABLE and restoration may be difficult or impossible.
    # For Local object stores, this group name corresponds to a subfolder in the 'store.dir'
    # For DuraCloud object stores, this group name corresponds to a DuraCloud Space ID (Space must already exist)
    group.delete.name = deletes
    

...

However, as this is an optional set of tasks, they are disabled by default. Should you wish to enable these tasks, just do the following:

  1. Wiki Markup*General Curation Configuration:* First, in your {{\in your [dspace\]/config/modules/curate.cfg}} you will want to enable & configure the Checkm Manifest tasks. (NOTE: there is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which provides example settings).
    • Enable the Checkm Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).
      Code Block
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.checkm.TransmitManifest = transmitmanifest, \
          org.dspace.ctask.replicate.checkm.VerifyManifest = verifymanifest, \
          org.dspace.ctask.replicate.checkm.FetchManifest = fetchmanifest, \
          org.dspace.ctask.replicate.checkm.CompareWithManifest = auditmanifest, \
          org.dspace.ctask.replicate.checkm.RemoveManifest = removemanifest
      
    • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).
      Code Block
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          transmitmanifest = Transmit Checkm Manifest to Storage, \
          verifymanifest = Verify Checkm Manifest exists in Storage, \
          fetchmanifest = Fetch Checkm Manifest from Storage, \
          auditmanifest = Audit against Checkm Manifest, \
          removemanifest = Remove Checkm Manifest from Storage
      
    • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "checkm" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.* settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Checkm Validation Tasks" group for all these new Replication tasks.
      Code Block
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks,
         checkm = Checkm Validation Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.checkm = transmitmanifest, verifymanifest, fetchmanifest, auditmanifest, removemanifest
      

...

In order to budget for replication storage, she needs to know the 'size' of the collection. When she asks her sysadmin, he replies that it is easy to give her figures for the whole asset store, but since collections aren't stored separately, she would have to add up each item's bitstreams in the collection, a rather tedious process. Thus the first task: a reporting tool which operates on natural DSpace objects, rather than storage volumes.

Wiki MarkupTo install this task, edit {{\[dspace\]/config/modules/curate.cfg}} (NB: all curation configuration is 'modular' in the sense that the configuration properties live outside of dspace.cfg, in named files. This means that if a given suite of tasks is unused, it's configuration is never installed). First, add the task to the lists of curation tasks.

Code Block
plugin.named.org.dspace.curate.CurationTask = \
.... other curation tasks
    org.dspace.ctask.replicate.EstimateAIPSize = estaipsize

...

Having secured approval to replicate 'Amazing Images' collection, our curator obviously needs a task to generate the AIP representations of each item in the collection, and transmit these archive files to the replication storage site (which may be service-backed, local, in the cloud, etc, as will be explored below). Adding this task is just like the previous step: editing into curate.cfg the configuration properties. (We won't repeat a description of this process each time, but note that you may always add a task, but elect not to display it in the administrative UI.). This task is 'org.dspace.ctask.replicate.TransmitAIP'.

Wiki MarkupSince we are now working with AIPs, we should examine how they are configured to the tasks. Most configuration specific to the replication task suite is found at {{\[dspace\]/config/modules/replicate.cfg}}. There are two main properties to set (or accept default values):

Code Block
# Package type. Permitted values: 'mets', 'bagit'
packer.pkgtype = mets
# Format of package compression. Permitted values: 'zip' or 'tgz'
# for 'mets' packages, only zip is supported
packer.archfmt = zip

...

The odometer statistics are stored in a small text file located at: {{\[base.dir\]/odometer}}, where {{\[base.dir\]}} is the value of the {{base.dir}} setting in your {{\[dspace\]/config/modules/replicate.cfg}} configuration file. Should you ever need to reset your odometer, you can do so by moving or removing this existing {{odometer}} file.

Info
titleMore Information on where Odometer statistics are kept

Wiki Markup

Automation (optional)

While the coordinated use of the tasks described above can provide the basis for a solid replication strategy and practice, there are several processes that could necessitate a fair amount of curatorial work. For example, in the discussion on ensuring integrity of AIPs over time, we remarked that vigilance was required by the curator to transmit new AIPs whenever Items change. It is possible to leverage existing facilities in DSpace to substantially reduce this effort through automation.

...

For replicating in earnest, a service like DuraCloud is recommended (DuraCloudObjectStore. Such a service has the additional benefits of providing offsite storage/replication while also providing additional preservation management tools. Note that this service must be established and provisioned prior to use. For more information on DuraCloud see: http://www.duracloud.orgImage Removed

Alternatively, the MountableObjectStore option may be used if you wish to keep your AIP storage more "local" (e.g. on a local SAN or storage network). This option acts similar to the default configuration (in that it writes to the local directory configured by the 'store.dir' property in replicate.cfg). But, the expectation is that directory is actually a mounted storage drive, so AIPs are written in such a way as to support more complex storage architectures (e.g. an NFS-mounted store).

...