Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: extra details about syncing deleted objects

...

  1. A copy of all configuration files utilized by the Replication Task Suite (RTS) can be found in the following locations:
    1. Configs for RTS version 1.x : https://github.com/DSpace/dspace-replicate/tree/dspace-replicate-1_x/config/modules
    2. Configs for RTS version 3.x : https://github.com/DSpace/dspace-replicate/tree/master/config/modules
  2. Copy the following configuration files to your DSpace's [dspace]/config/modules/ directory:
    1. replicate.cfg - This file contains the base settings for the Replication Task Suite
    2. replicate-mets.cfg - This file provides a few additional replication options specific to METS-based AIPs (see below for more details)
    3. duracloud.cfg - If you'd like to replicate/backup your content to DuraCloud, this file holds your DuraCloud account information
  3. Edit your [dspace]/config/modules/curate.cfg configuration file to define & enable all tasks. The list of tasks to add to this configuration file depends on which type of AIP (METS based or BagIt based) you wish to use. Please see the AIP Format Options section below for the details of what should be added to your curate.cfg file
    1. A sample, fully enabled curate.cfg configuration file is provided alongside the other Replication Task Suite config files listed above.  This sample file is preconfigured to use METS-based AIPs.
  4. Recommended (but not required):  Edit your [dspace]/config/modules/dspace.cfg and enable the Replication Task Suite 'listener' to perform automatic synchronization of your AIP backup store with what is in DSpace (see Automation Options for more info).

Overview of Configuration Options

...

  1. AIP Format Options: Does you institution want to backup using the default DSpace AIP format (METS packaging)? Or would you rather utilize the new BagIt AIP Format?
  2. Storage Options: Does you institution plan to use the Replication Suite to backup to a local/mounted drive? Or would you like to connect it to a DuraCloud account?
  3. Automation Options: Do you want to automatically sync your AIP backup store with what is in DSpace? (this is highly recommended, but not required)
  4. Additional Options: Do you plan to use Checkm manifests for checksum auditing?

...

  1. Automatically Sync Changes (via Queue) : Any changes that happen in DSpace (new objects, changed objects, deleted objects) are automatically added to a "queue". This queue can then be processed on a schedule (via cron).
  2. Scheduled Site Auditing/Replication : You may also wish to perform a full site audit or backup on a scheduled basis.

...

In order to enable/activate synchronization, you will need to add a new consumer to the list of DSpace consumers (in dspace.cfg).  It is recommended to add this new configuration to the end of the list of existing "event.consumer." options in your dspace.cfg file.

  • METS-based AIP Replicate Consumer: This consumer will listen for changes to any DSpace Communities, Collections, Items, Groups, or EPeople.  It should be utilized if you have chosen to use METS-based AIPs. See AIP Format Options above for more details.

    Code Block
    #### Event System Configuration ####
    
    # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer)
    event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate
    
    ....
    
    # Configure consumer to manage METS AIP content replication
    event.consumer.replicate.class = org.dspace.ctask.replicate.METSReplicateConsumer
    event.consumer.replicate.filters = Community|Collection|Item|Group|EPerson+All
    

     

    • In human terms, this configuration essentially means: listen for all changes to Communities, Collections, Items, Groups and EPeople. If a change is detected, run the "METSReplicateConsumer" (which adds that object to the queue).
  • BagIt-based AIP Consumer : This consumer will ONLY listen for changes to DSpace Communities, Collections and Items as those are the only types of objects which are stored in BagIt-based AIPs. See AIP Format Options above for more details

    Code Block
    #### Event System Configuration ####
    
    # ADD the "replicate" consumer to the end of the list of 'default.consumers' (This enables the consumer)
    event.dispatcher.default.consumers = versioning, search, browse, discovery, eperson, harvester, replicate
    
    ....
    
    # Configure consumer to manage BagIt AIP content replication
    event.consumer.replicate.class = org.dspace.ctask.replicate.BagItReplicateConsumer
    event.consumer.replicate.filters = Community|Collection|Item+Install|Modify|Modify_Metadata|Delete
    

     

    • In human terms, this configuration essentially means: listen for any new, modified or deleted Items, Collections and Communities. If you do not care about Community or Collection AIPs, just remove 'Community' or 'Collection' from the list. When one of the specified changes is detected, run the "BagItReplicateConsumer" (which adds that object to the queue).

...

  • Newly Added Objects: If the event is an addition of a new DSpace object (for items this only occurs once the item exits approval workflow), then a request for an AIP transmission is queued.
  • Changed Objects: The same occurs whenever an object has changed (so-called modify events). The modified object is queued for AIP transmission.
  • Deleted Objects: When an object is deleted, a 'catalogrecord' of the deletion is transmitted to the replication service. The deletion record is stored in your configured "group.delete.name" (in replicate.cfg) and named DELETION-RECORD@[handle].zip. The catalog just   The deletion record is a BagIt package which simply lists all the objects that were deleted: if an item, then just the handle of the item, if a collection, then all the item handles that were in it. This way, if the deletion was mistakena mistake, the catalog "deletion record" can be used to recover all the contents. This represents the default behavior of the consumer. However, you may configure it in [dspace]/modules/replicate.cfg.
    • It is worth noting that when you delete an object in DSpace, the Sync Consumer will NOT delete that object's AIP from storage.  All it does is write a "DELETION-RECORD@[handle]" file to your configured storage location.  This ensures that you can review those "deletion records" at a later time, and decide whether to permanently delete the AIP(s) from storage, or alternatively restore the deleted object(s) in DSpace from their AIP(s).
    • How to restore deleted objects: Simply run one of the "Restore from AIP" commands available in the Replication Task Suite, passing it the handle of the deleted object.  That command will locate the associated "DELETION-RECORD" file and appropriately restore any objects listed in that deletion record.
    • How to permanently delete objects' AIPs from storage:  If an object deletion was found to be valid, you may wish to permanently remove the deleted object's AIP from remote storage (to save storage space).  Simply run the "Remove AIP" command, passing it the handle of the deleted object.  That command will permanently delete the object's AIP along with any associated "DELETION-RECORD" file from your storage location.
Configuring the Sync Consumer

...