Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Replication Task Suite is a DSpace 1.8 Add-On which provides a set of curation system tasks to assist in performing replication (backup/restore/audit) of DSpace contents to other locations. The DSpace content is packaged in containers known as AIPs (OAIS speak: 'archival information packages'). You can read much more about how AIPs are constituted here: AIP Backup and Restore. This add-on is also built on the DSpace curation system, which is described here: CurationSystem. We will describe a concrete situation facing a repository data curator, and introduce each task as the need arises. We will also describe some of the technical configuration details to enable these tasks.

Info
titleSource Code is availableEarly Access Release Available

An "Early Access" release of the The Replication Task Suite source code is available at: http://scm.dspace.org/svn/repo/modules/dspace-replicate/tags/dspace-replicate-1.0-EA/Image Modified In addition, there is an associated JIRA Issue at: https://jira.duraspace.org/browse/DS-876Image Removed

This 1.0-EA (Early Access) release may also be installed via Maven.

Note
titleMore Information

More information on the Replication Task Suite is available from the following webinars/screencasts:

...

Source Code:
The Replication Task Suite source code is available at: http://scm.dspace.org/svn/repo/modules/dspace-replicate/Image Added
In addition, there is an associated JIRA Issue at: https://jira.duraspace.org/browse/DS-876Image Added

...

Info
Table of Contents
minLevel2
outlinetrue
stylenone

Prerequisites

Must be installed on a DSpace 1.8.x System

Warning
titleKnown Curation System bug in 1.8.0

DSpace 1.8.0 contains a bug in the Curation System which causes a NullPointerException error to be returned when any curation task is run across the entire site (see DS-1077). This bug directly affects the Replication Task Suite. Even when a replication task succeeds, it will still throw a NullPointerException. You can check the DSpace logs to tell whether the task actually succeeded or not. This bug will be resolved in DSpace 1.8.1.
Because of the above bug, we recommend running the Replication Suite on DSpace 1.8.1 or above.

Because of enhancements to the Curation System in DSpace 1.8.0, the Replication Suite is only compatible with a DSpace 1.8.x System.

User Interface Compatibility Notes

As the Replication Suite is just a suite of Curation System tasks, it may be called (like any Curation Tasks) from the following locations:

  • From the Command Line
  • From the Admin UI (XMLUI

...

  • Only)
  • From Approval Workflow
  • From custom Java code

For more information see the Curation System details on Task Invocation.

Installation

Note
titleWORK IN PROGRESS

These instructions are still a work in progress.

Maven-based Installation

  1. Wiki Markup
    In your DSpace Source directory ({{\[dspace-src\]}}), you will modify two Maven {{pom.xml}} files:
    • Wiki Markup
      {{\[dspace-src\]/dspace/pom.xml}}  (This POM controls dependencies of CommandLine scripts. Modifying it will let you run {{dspace-replicate}} from commandline)
    • Wiki Markup
      {{\[dspace-src\]/dspace/modules/xmlui/pom.xml}}  (This POM controls dependencies of XMLUI. Modifying it will let you run {{dspace-replicate}} from XMLUI)
  2. For both of these pom.xml files, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag):
    Code Block
    <dependency>
       <groupId>org.dspace</groupId>
       <artifactId>dspace-replicate</artifactId>
       <version>1.0-EA</version>
    </dependency>
    
  3. Wiki Markup
    Once you've finished modifying both pom.xml files, rebuild DSpace by running the following from your {{\[dspace-src\]/dspace/}} folder:
    Code Block
    
    mvn clean package
    
  4. Wiki Markup
    You will need to update your existing DSpace 1.8.x installation, by running the following from your {{\[dspace-src\]/dspace/target/dspace-1.8.x-SNAPSHOT-build/}} directory
    Code Block
    
    ant update
    
    Note

    Alternatively, if you don't want to do a full update, you can just update your existing binaries & webapps by running the following two commands:

    1. Wiki Markup
      {{ant update_code}}  (Updates the existing \[dspace\]/lib/ directory)
    2. Wiki Markup
      {{ant update_webapps}} (Updates the existing \[dspace\]/webapp/ directory)
  5. Copy the Replication Suite's configuration files to your DSpace configuration directory
    • Wiki Markup
      *Replication Suite Configuration File:* Copy {{\[dspace-replicate\]/config/modules/replicate.cfg}} to your {{\[dspace\]/config/modules/}} directory
    • Wiki Markup
      *METS-specific AIP Configuration Settings:* Copy {{\[dspace-replicate\]/config/modules/replicate-mets.cfg}} to your {{\[dspace\]/config/modules/}} directory
    • Wiki Markup
      *DuraCloud Configuration File:* Copy {{\[dspace-replicate\]/config/modules/duracloud.cfg}} to your {{\[dspace\]/config/modules/}} directory
  6. Finally, follow the Configuration settings instructions below to configure the Replication Suite based on your usage needs.
    • Wiki Markup
      There is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which can be used as a reference. It is pre-configured to use the [DSpace AIP Format|DSDOC18:DSpace AIP Format] (METS-based packaging)

For more information see the Curation System details on Task Invocation.

Installation

Note
titleWORK IN PROGRESS
These instructions are still a work in progress
    • .

Manual Installation

  1. Download the Replication Suite code

    ...

    1. Build/Compile the Replication Suite, by running the following from the root directory
      Code Block
      mvn package
    2. Copy the generated JAR files to your DSpace 1.8 installation.
      1. Wiki Markup
        There are a total of 5 JARs that will need to be copied to your {{\[dspace\]/lib/}}
        • Wiki Markup
          {{\[dspace-replicate\]/target/dspace-replicate-\[version\].jar}}  (The Replication Suite Plugin)
        • Wiki Markup
          {{\[dspace-replicate\]/target/lib/common-\[version\].jar}} (DuraCloud common libraries - required for DuraCloud integration)
        • Wiki Markup
          {{\[dspace-replicate\]/target/lib/commons-compress-\[version\].jar}} (Apache Commons Compress - prerequisite for Replication Suite plugin)
        • Wiki Markup
          {{\[dspace-replicate\]/target/lib/storageprovider-\[version\].jar}} (DuraCloud storage provider libraries - required for DuraCloud integration)
        • Wiki Markup
          {{\[dspace-replicate\]/target/lib/storeclient-\[version\].jar}} (DuraCloud store client libraries - required for DuraCloud integration)
      2. Wiki Markup
        Also, copy the above 5 JARs also to your XMLUI web application's WEB-INF/lib directory (e.g. {{\[dspace\]/webapps/xmlui/WEB-INF/lib/}})
    3. Copy the Replication Suite's configuration files to your DSpace configuration directory
      • Wiki Markup
        *Replication Suite Configuration File:* Copy {{\[dspace-replicate\]/config/modules/replicate.cfg}} to your {{\[dspace\]/config/modules/}} directory
      • Wiki Markup
        *METS-specific AIP Configuration Settings:* Copy {{\[dspace-replicate\]/config/modules/replicate-mets.cfg}} to your {{\[dspace\]/config/modules/}} directory
      • Wiki Markup
        *DuraCloud Configuration File:* Copy {{\[dspace-replicate\]/config/modules/duracloud.cfg}} to your {{\[dspace\]/config/modules/}} directory
    4. Finally, follow the Configuration settings instructions below to configure the Replication Suite based on your usage needs.
      • Wiki Markup
        There is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which can be used as a reference. It is pre-configured to use the [DSpace AIP Format|DSDOC18:DSpace AIP Format] (METS-based packaging).

    Maven-based Installation (Coming Soon)

    Coming Soon.

    Configuration

    Configuration of the Replication Task Suite is based entirely on your local institution's backup, restore and preservation needs.

    Before getting started, you may wish to determine the answers to the following questions:

    1. #AIP Format Options: Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt AIP Format?
    2. #Storage Options: Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
    3. #Additional Options: Do you plan to use Checkm manifests for checksum auditing?

    AIP Format Options

    One of the first questions to ask yourself is the format you wish to utilize for your AIPs.

    There are two options:

    1. DSpace AIP Format (METS-based) (default) - This is the same AIP format utilized by the DSpace AIP Backup and Restore feature, so it is 100% compatible with that existing feature. In fact when using this format, the Replication Task Suite just "wraps" calls to the AIP Backup and Restore feature itself.
    2. BagIt AIP Format - This is a new AIP format provided by the Replication Task Suite. It generates AIPs in the BagIt File Packaging Format. Institutions which already are familiar with BagIt or use it elsewhere may find this format preferrable.

    Configuring usage of DSpace default AIP Format (METS-based)

    This section goes through the steps of configuring the Replication Suite to use the default DSpace AIP format, which utilizes METS packaging.

    1. Wiki Markup
      *General Curation Configuration:* First, in your {{\[dspace\]/config/modules/curate.cfg}} you will want to enable & configure the METS-based replication tasks. (NOTE: there is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which is pre-configured to use METS-based AIPs).
      • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
        REMEMBER to add a comma and backslash (", \") after each line (except the final line).
        Code Block
        
        plugin.named.org.dspace.curate.CurationTask = \
            ... (YOUR EXISTING TASKS) ... , \
            org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
            org.dspace.ctask.replicate.ReadOdometer = readodometer, \
            org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
            org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
            org.dspace.ctask.replicate.FetchAIP = fetchaip, \
            org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
            org.dspace.ctask.replicate.RemoveAIP = removeaip, \
            org.dspace.ctask.replicate.METSRestoreFromAIP = restorefromaip, \
            org.dspace.ctask.replicate.METSRestoreFromAIP = replacewithaip, \
            org.dspace.ctask.replicate.METSRestoreFromAIP = restorekeepexisting, \
            org.dspace.ctask.replicate.METSRestoreFromAIP = restoresinglefromaip, \
            org.dspace.ctask.replicate.METSRestoreFromAIP = replacesinglewithaip
        
      • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
        REMEMBER to add a comma and backslash (", \") after each line (except the final line).
        Code Block
        
        ui.tasknames = \
            ... (YOUR EXISTING TASK NAMES) ... , \
            estaipsize = Estimate AIP(s) Size, \
            readodometer = Read Odometer, \
            transmitaip = Transmit AIP(s) to Storage, \
            verifyaip = Verify AIP(s) exist in Storage, \
            fetchaip = Fetch AIP(s) from Storage, \
            auditaip = Audit/Compare against AIP(s), \
            removeaip = Remove AIP(s) from Storage, \
            restorefromaip = Restore Missing Object(s) from AIP(s), \
            replacewithaip = Replace Existing Object(s) with AIP(s), \
            restorekeepexisting = Restore Missing Object(s) but Keep Existing Objects,\
            restoresinglefromaip = Restore Single Object from AIP, \
            replacesinglewithaip = Replace Single Object with AIP
        
      • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.* settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.
        Code Block
        
        # Tasks may be organized into named groups which display together in UI drop-downs
        ui.taskgroups = \
           general = General Purpose Tasks,
           replicate = Replication Suite Tasks
        
        # Group membership is defined using comma-separated lists of task names, one property per group
        ui.taskgroup.general = profileformats, requiredmetadata, checklinks
        ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip, restorekeepexisting, restoresinglefromaip, replacesinglewithaip
        
    2. Wiki Markup
      *Replication Suite Configuration*: Next, in your {{\[dspace\]/config/modules/replicate.cfg}} you will want to ensure it is setup to properly use METS-based AIPs.   Under the "AIP Packaging Settings" you'll want the following settings enabled:
      Code Block
      
      # Package type. Permitted values: 'mets', 'bagit'
      # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
      # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
      packer.pkgtype = mets
      
      # Format of package compression. Permitted values: 'zip' or 'tgz'
      # for 'mets' packages, only 'zip' is supported
      packer.archfmt = zip
      
      # Whether or not the name packages with a DSpace type prefix.
      # When 'true', package files are named [type]@[handle].[format] (e.g. ITEM@123456789-1.zip)
      # When 'false', package files are named [handle].[format] (e.g. 123456789-1.zip)
      # Defaults to 'true'. For 'mets' packages, this must be 'true'.
      packer.typeprefix = true
      
    3. Wiki Markup
      *Optionally tweak the AIP Restore/Replace settings:*  Optionally, you can decide to tweak the way AIPs are restored or replaced (using [DSDOC18:AIP Backup and Restore]). These settings normally *should not need to be tweaked*, but are available in the {{\[dspace\]/config/modules/replicate-mets.cfg}} configuration file.  See that configuration file for more details.

    Configuring usage of DSpace BagIt AIP Format

    This section goes through the steps of configuring the Replication Suite to use BagIt-based AIPs. For more information on the BagIt packaging format, see: https://wiki.ucop.edu/display/Curation/BagItImage Modified

    1. Wiki Markup
      *General Curation Configuration:* First, in your {{\[dspace\]/config/modules/curate.cfg}} you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a sample {{curate.cfg}} file provided in {{\[dspace-replicate\]/config/modules/curate.cfg}} which provides example settings).
      • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
        REMEMBER to add a comma and backslash (", \") after each line (except the final line).
        Code Block
        
        plugin.named.org.dspace.curate.CurationTask = \
            ... (YOUR EXISTING TASKS) ... , \
            org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
            org.dspace.ctask.replicate.ReadOdometer = readodometer, \
            org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
            org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
            org.dspace.ctask.replicate.FetchAIP = fetchaip, \
            org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
            org.dspace.ctask.replicate.RemoveAIP = removeaip, \
            org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \
            org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
        
      • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
        REMEMBER to add a comma and backslash (", \") after each line (except the final line).
        Code Block
        
        ui.tasknames = \
            ... (YOUR EXISTING TASK NAMES) ... , \
            estaipsize = Estimate AIP(s) Size, \
            readodometer = Read Odometer, \
            transmitaip = Transmit AIP(s) to Storage, \
            verifyaip = Verify AIP(s) exist in Storage, \
            fetchaip = Fetch AIP(s) from Storage, \
            auditaip = Audit/Compare against AIP(s), \
            removeaip = Remove AIP(s) from Storage, \
            restorefromaip = Restore Missing Object(s) from AIP(s), \
            replacewithaip = Replace Existing Object(s) with AIP(s)
        
      • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.* settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.
        Code Block
        
        # Tasks may be organized into named groups which display together in UI drop-downs
        ui.taskgroups = \
           general = General Purpose Tasks,
           replicate = Replication Suite Tasks
        
        # Group membership is defined using comma-separated lists of task names, one property per group
        ui.taskgroup.general = profileformats, requiredmetadata, checklinks
        ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
        
    2. Wiki Markup
      *Replication Suite Configuration*: Next, in your {{\[dspace\]/config/modules/replicate.cfg}} you will want to ensure it is setup to properly use BagIt-based AIPs.   Under the "AIP Packaging Settings" you'll want the following settings enabled:
      Code Block
      
      # Package type. Permitted values: 'mets', 'bagit'
      # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
      # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
      packer.pkgtype = bagit
      

    Storage Options

    Where your AIPs will be stored is the next decision to make. There are three options currently available:

    1. Local Storage: Replicate/Backup content to another location (folder) on your local filesystem.
    2. Mountable Storage: Replicate/Backup content to a mounted external filesystem (e.g. NFS-mounted drive).
    3. DuraCloud Storage: Replicate/Backup content to an existing DuraCloud account.

    Configuring Local Storage

    ...

    info

    The local storage option may also be used for a mounted drive / SAN which just appears as though it is a local filesystem folder. However, some mounted drives (e.g. NFS-mounted drives) may need to use the Mountable Storage option instead.

    Info

    Before configuring a local storage option, please ensure you have enough space available on your local hard drive (or mounted drive/SAN if your local folder is actually remote storage). You can use the "Estimate Storage Space" (estaipsize) task to estimate the amount of new storage space you will need.

    ...