Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info
titleLatest Releases of Replication Task Suite

Based on the version of DSpace you are running, here are the compatible latest releases of the Replication Task Suite:

  • RTS, version 6.1 - compatible only with DSpace 6.x releases
    • Upgrading: To upgrade to RTS 6.1 from a previous version, simply change your pom.xml (see 81953514) to reference 'dspace-replicate' version 6.1.  Then rebuild DSpace & re-run 'ant update'. You should verify your configurations are still compatible with DSpace 6.x, as the DSpace Configuration System received an overhaul in DSpace 6
      • After upgrading the RTS software, it is recommended to run a full backup to ensure all your AIP packages are also updated (if necessary).
    • Version 6.1 release notes
  • RTS, version 5.0 - compatible only with DSpace 5.x releases
  • RTS, version 3.5 - bug-fix release, compatible with all DSpace 3.x and 4.x releases
    • Upgrading: To upgrade to RTS 3.5 from a previous version, simply change your pom.xml (see ReplicationTaskSuite#Installation on DSpace 3.x or 4.x 81953514) to reference 'dspace-replicate' version 3.5.  Then rebuild DSpace & re-run 'ant update'. Your existing RTS 3.x configuration files will still work with RTS 3.5.
      • After upgrading the RTS software, it is recommended to run a full backup to ensure all your AIP packages are also updated (if necessary).
    • Version 3.5 release notes
  • RTS, version 1.3 - bug-fix release, compatible with all DSpace 1.8.x releases.
    • Upgrading: To upgrade to RTS version 1.3 from a previous release, simply change your pom.xml (see 81953514) to reference 'dspace-replicate' version 1.3. Then rebuild DSpace & re-run 'ant update'. Your existing RTS 1.x configuration files will still work with RTS 1.3.
      • After upgrading the RTS software, it is recommended to run a full backup to ensure all your AIP packages are also updated (if necessary).
    • 1.3 Bug Fixes: This fixes a DuraCloud v2.4.0 connection error with version 1.2.
    • 1.2 Bug Fixes: This fixes a Java 6 incompatibility bug in version 1.1.  Previously version 1.1 required Java 7 when using DuraCloud.
    • 1.1 Bug Fixes: Fixes for several small bugs in 1.0 (namely with the event consumer utilized during Automatic Replication).

...

Installation instructions for each version are included below:

User Interface Compatibility Notes

...

  1. In your DSpace Source directory ([dspace-src]), you will need to modify the following POM file:
    • [dspace-src]/dspace/modules/additions/pom.xml (This POM will ensure that the "dspace-replicate" dependency is made available to commandline and ALL DSpace interfaces)

  2. For this pom.xml file, add the following <dependency> section at the end of the existing <dependencies> section (just before the closing </dependencies> tag). NOTE: the exclusions are required to work around DS-3536.

    Code Block
    <dependencies>
        ...
    
        <!-- Adding this dependency will install the Replication Task Suite Addon -->
        <dependency>
            <groupId>org.dspace</groupId>
            <artifactId>dspace-replicate</artifactId>
            <version>6.1</version>
              <!-- These exclusions are currently necessary to resolve dependency mismatches with some dependencies pulled into RTS 6.0 to work with DuraCloud, see DS-3536 for details -->
              <exclusions>
                     <exclusion>
                            <groupId>org.apache.commons</groupId>
                            <artifactId>commons-lang3</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>com.amazonaws</groupId>
                            <artifactId>aws-java-sdk-core</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.apache.httpcomponents</groupId>
                            <artifactId>httpmime</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.springframework</groupId>
                            <artifactId>spring-expression</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.springframework.security</groupId>
                            <artifactId>spring-security-core</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.codehaus.jackson</groupId>
                            <artifactId>jackson-mapper-asl</artifactId>
                     </exclusion>
                     <exclusion>
                            <groupId>org.codehaus.jackson</groupId>
                            <artifactId>jackson-core-asl</artifactId>
                     </exclusion>
              </exclusions>
        </dependency>
    
    </dependencies> 


  3. Once you've finished modifying the pom.xml file, rebuild DSpace by running the following from your [dspace-src]/dspace/ folder:

    Code Block
    mvn clean package
    


  4. Update the default dspace.cfg to include the Replication Task Suite config files. This ensures these configs are loaded as part of your DSpace configuration. This also allows you to override the configurations in your own local.cfg file. Including the duracloud.cfg file is only required if you are planning to replicate/backup your content to DuraCloud.

    Code Block
    include = ${module_dir}/replicate.cfg
    include = ${module_dir}/replicate-mets.cfg
    include = ${module_dir}/replicate-bagit.cfg
    include = ${module_dir}/duracloud.cfg
    1. You should ensure these configurations exist in your [dspace-src]/dspace/config/modules directory.  That way they will be auto-installed/copied whenever you run "ant update" (see next step).
  5. Follow the instructions in the 81953514 section below in order to enable & configure the Replication Task Suite Add-On.
  6. You will need to update your existing DSpace 3.x installation, by running the following from your [dspace-src]/dspace/target/dspace-[version]-build/ directory

    Code Block
    ant update
    


    Note

    Alternatively, if you don't want to do a full DSpace update, you can just update your existing binaries & webapps by running the following two commands:

    • ant update_code (Updates the existing [dspace]/lib/ directory)
    • ant update_webapps (Updates the existing [dspace]/webapp/ directory)


...

  1. A copy of all configuration files utilized by the Replication Task Suite (RTS) can be found in the following locations:
    1. Configs for RTS version 1.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-1_x/config/modules
    2. Configs for RTS version 3.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-3_x/config/modules
    3. Configs for RTS version 6.x : https://github.com/DSpace/dspace-replicate/tree/master/config/modules
  2. Copy the following configuration files to your DSpace's [dspace]/config/modules/ directory:
    1. replicate.cfg - This file contains the base settings for the Replication Task Suite
    2. replicate-mets.cfg - This file provides a few additional replication options specific to METS-based AIPs (see below for more details)
    3. replicate-bagit.cfg - This file provides additional configuration for BagIt AIPs (see below for more details)
    4. duracloud.cfg - If you'd like to replicate/backup your content to DuraCloud, this file holds your DuraCloud account information
  3. Edit your [dspace]/config/modules/curate.cfg configuration file to define & enable all tasks. The list of tasks to add to this configuration file depends on which type of AIP (METS based or BagIt based) you wish to use. Please see the 81953514 section below for the details of what should be added to your curate.cfg file
    1. A sample, fully enabled curate.cfg configuration file is provided alongside the other Replication Task Suite config files listed above.  This sample file is preconfigured to use METS-based AIPs.
  4. Recommended (but not required):  Edit your [dspace]/config/modules/dspace.cfg and enable the Replication Task Suite 'listener' to perform automatic synchronization of your AIP backup store with what is in DSpace (see Automation Options for more info).

...


DSpace AIP Format (METS-based AIPs)

BagIt AIP Format

Supported Backup/Restore Types



Can Backup & Restore all DSpace Content easily

Yes

Yes

Can Backup & Restore a Single Community/Collection/Item easily

Yes

Yes

Backups can be used to move one or more Community/Collection/Items to another DSpace system easily.

Yes (Using the Replication Task Suite or using the command line AIP Backup and Restore tools)

Yes (though the Replication Task Suite add-on must be installed on both systems)

Can Backup & Restore Item Versions (added in DSpace 3.x)No (Item Versioning not yet compatible with AIP format. Only the most recent version of an Item is described in the AIP.)No (Item Versioning not yet compatible with AIP format. Only the most recent version of an Item is described in the AIP.)

Supported DSpace Object Types



Supports backup/restore of all Communities/Collections/Items (including metadata, files, logos, etc.)YesYes
Supports backup/restore of all People/Groups/PermissionsYesNo (Not yet supported)Yes
Supports backup/restore of all Collection-specific Item TemplatesYesNo (Not yet supported)
Supports backup/restore of all Collection Harvesting settings (only for Collections which pull in all Items via OAI-PMH or OAI-ORE)No (The harvest settings are not preserved, but previously harvested items are preserved in their own AIPs)No (The harvest settings are not preserved, but previously harvested items are preserved in their own AIPs)
Supports backup/restore of all Withdrawn (but not deleted) ItemsYesYes
Supports backup/restore of Item Mappings between CollectionsYesYes
Supports backup/restore of all in-process, uncompleted Submissions (or those currently in an approval workflow)

No (AIPs are only generated for objects which are completed and considered "in archive")

No (AIPs are only generated for objects which are completed and considered "in archive")

Supports backup/restore of Items using custom Metadata Schemas & FieldsYesYes
Supports backup/restore of all local DSpace Configurations and CustomizationsNo (You are expected to backup your DSpace configurations and customizations separately. AIPs only backup content held within DSpace.)No (You are expected to backup your DSpace configurations and customizations separately. AIPs only backup content held within DSpace.)

...

This section goes through the steps of configuring the Replication Suite to use BagIt-based AIPs. For more information on the BagIt packaging format, see: The Replication Suite uses the BagIt Profiles specification in order to provide additional guarantees about the BagIt AIPs which are exported and ingested. The following profiles are supported:

BagIt Profile IdentifierExternal Link
aptrusthttps://

...

If no BagIt Profile is specified the beyondtherepository profile will be used by default. For more information on the BagIt packaging format, see: https://wiki.ucop.edu/display/Curation/BagIt; the BagIt Profiles implementation used is DuraSpace's bagit-support.

  1. General Curation General Curation Configuration: First, in your [dspace]/config/modules/curate.cfg you will want to enable & configure the BagIt-based replication tasks. (NOTE: there is a sample curate.cfg file provided in https://github.com/DSpace/dspace-replicate/tree/master/config/modules which provides example settings, though they are all commented out by default).
    • Enable the Replication Tasks: In the list of "Task Class implementations" (plugin.named.org.dspace.curate.CurationTask), add the following.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      plugin.named.org.dspace.curate.CurationTask = \
          ... (YOUR EXISTING TASKS) ... , \
          org.dspace.ctask.replicate.EstimateAIPSize = estaipsize, \
          org.dspace.ctask.replicate.ReadOdometer = readodometer, \
          org.dspace.ctask.replicate.TransmitAIP = transmitaip, \
          org.dspace.ctask.replicate.VerifyAIP = verifyaip, \
          org.dspace.ctask.replicate.FetchAIP = fetchaip, \
          org.dspace.ctask.replicate.CompareWithAIP = auditaip, \
          org.dspace.ctask.replicate.RemoveAIP = removeaip, \
          org.dspace.ctask.replicate.BagItRestoreFromAIP = restorefromaip, \
          org.dspace.ctask.replicate.BagItReplaceWithAIP = replacewithaip
      


    • Give Each Task a Human-Friendly Task Name: Under the ui.tasknames setting, give each of the above Tasks a human-friendy name. Here are some recommended values, but you are welcome to tweak them.
      REMEMBER to add a comma and backslash (", \") after each line (except the final line).

      Code Block
      ui.tasknames = \
          ... (YOUR EXISTING TASK NAMES) ... , \
          estaipsize = Estimate Storage Space for AIP(s), \
          readodometer = Read Odometer, \
          transmitaip = Transmit AIP(s) to Storage, \
          verifyaip = Verify AIP(s) exist in Storage, \
          fetchaip = Fetch AIP(s) from Storage, \
          auditaip = Audit/Compare against AIP(s), \
          removeaip = Remove AIP(s) from Storage, \
          restorefromaip = Restore Missing Object(s) from AIP(s), \
          replacewithaip = Replace Existing Object(s) with AIP(s)
      


    • Optionally Create a Task Group: Finally, if you'd like to create a Task Group for these tasks, you can create a group named "replicate" and add them all to it. The below is just an example for how you may wish to set the ui.taskgroups and ui.taskgroup.*settings. It creates two Task Groups: (1) a "General Purpose Tasks" group for a few default DSpace Curation Tasks, and (2) a "Replication Suite Tasks" group for all these new Replication tasks.

      Code Block
      # Tasks may be organized into named groups which display together in UI drop-downs
      ui.taskgroups = \
         general = General Purpose Tasks, \
         replicate = Replication Suite Tasks
      
      # Group membership is defined using comma-separated lists of task names, one property per group
      ui.taskgroup.general = profileformats, requiredmetadata, checklinks
      ui.taskgroup.replicate = estaipsize, readodometer, transmitaip, verifyaip, fetchaip, auditaip, removeaip, restorefromaip, replacewithaip
      


  2. Replication Suite Configuration: Next, in your [dspace]/config/modules/replicate.cfg you will want to ensure it is setup to properly use BagIt-based AIPs. Under the "AIP Packaging Settings" you'll want the following settings enabled:

    Code Block
    # Package type. Permitted values: 'mets', 'bagit'
    # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
    # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
    packer.pkgtype = bagit
    ', 'bagit'
    # mets = Generate default DSpace AIPs as described in: https://wiki.duraspace.org/display/DSDOC18/AIP+Backup+and+Restore
    # bagit = Generate AIPs based on the BagIt packaging format: https://wiki.ucop.edu/display/Curation/BagIt
    packer.pkgtype = bagit
    


  3. BagIt Configuration: Finally, in [dspace]/config/modules/replicate-bagit.cfg, you will need to configure settings for the BagIt tasks:

    • Configure the BagIt Profile: Set the BagIt Profile which will be used

      Code Block
      # The Bag Profile setting allows you to select a BagProfile which the RTS
      # will create and read bags for. The RTS will check the conformance of a
      # bag to a profile as part of both the packaging and restoration processes.
      #    
      # See: https://github.com/duraspace/bagit-support/ for more information
      #                          
      # Available Options: aptrust, beyondtherepository
      # Default: beyondtherepository
                                    
      replicate-bagit.profile = beyondtherepository


    • Configure the Bag Metadata: Under the replicate-bagit.tag, set appropriate values for additional bag metadata to be packaged with your DSpace AIPs. Each configuration property of this section follows the format of replicate-bagit.tag.tag-filename.metadata-key: metadata-value. See section 2.2.2 of the BagIt specification for more information on bag metadata.
      Note: depending on the BagIt Profile specified there will be different required fields for the bag metadata files, so it is important to know what profile you're working with.

      Code Block
      #### BagIt Bag Metadata Settings ####
                   
      # These settings allow you to customize the bag-info.txt which
      # is written by the BagIt packaging tools. By default no fields
      # are used which will produce Bags which do not conform to any
      # BagProfiles.
      
      replicate-bagit.tag.bag-info.source-organization = dspace
      replicate-bagit.tag.bag-info.organization-address = localhost


Storage Options

Where your AIPs will be stored is the next decision to make. There are three options currently available:

...

  • METS-based AIP Replicate Consumer: This consumer will listen for changes to any DSpace Communities, Collections, Items, Groups, or EPeople.  It should be utilized if you have chosen to use METS-based AIPs. See 81953514 above for more details.

    Code Block
    #### Event System Configuration ####
    
    # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer)
    event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate
    
    ....
    
    # Configure consumer to manage METS AIP content replication
    event.consumer.replicate.class = org.dspace.ctask.replicate.METSReplicateConsumer
    event.consumer.replicate.filters = Community|Collection|Item|Group|EPerson+All
    


    • In human terms, this configuration essentially means: listen for all changes to Communities, Collections, Items, Groups and EPeople. If a change is detected, run the "METSReplicateConsumer" (which adds that object to the queue).
  • BagIt-based AIP Consumer : This consumer will ONLY listen for changes to DSpace Communities, Collections and Items as those are the only types of objects which are stored in BagIt-based AIPs. See 81953514 above for more details

    Code Block
    #### Event System Configuration ####
    
    # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer)
    event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate
    
    ....
    
    # Configure consumer to manage BagIt AIP content replication
    event.consumer.replicate.class = org.dspace.ctask.replicate.BagItReplicateConsumer
    event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete
    


    • In human terms, this configuration essentially means: listen for any new, modified or deleted Items, Collections and Communities. If you do not care about Community or Collection AIPs, just remove 'Community' or 'Collection' from the list. When one of the specified changes is detected, run the "BagItReplicateConsumer" (which adds that object to the queue).

...