Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Replication Task Suite is a DSpace Add-On which provides a set of curation system tasks to assist in performing replication (backup/restore/audit) of DSpace contents to other locations. The DSpace content is packaged in containers known as AIPs (OAIS speak: 'archival information packages'). By default, AIPs are generated in the default DSpace AIP Format (the same format used by the AIP Backup and Restore tool). If desired, there is an option to generate BagIt-based AIPs instead of using the default DSpace AIP format.

This Add-On also integrates DSpace with DuraCloud for users that wish to easily back up their content into DuraCloud directly from their DSpace administrative interface.

Noteinfo
titleMore Information

More information on the Replication Task Suite is available from the following webinars / screencasts:

The Problem Statement and Usage Examples section below also provides some real-life scenarios / examples of where each Replication task may come in handy.

...

  1. Automatically Sync Changes (via Queue) : Any changes that happen in DSpace (new objects, changed objects, deleted objects) are automatically added to a "queue". This queue can then be processed on a schedule.
  2. Scheduled Site Auditing/Replication : You may also wish to perform a full site audit or backup on a scheduled basis.

...

Please note: this change to curate.cfg will cause the entire DSpace Curation System (not just the Replication Task Suite tasks) to utilize the FilteredFileTaskQueue for queuing.  This should not cause any issues with other tasks, as the FilteredFileTaskQueue is an extension of FileTaskQueue, but it's worth noting that this change will effect all curation tasks.

Scheduled Site Auditing/Replication

Whether you decide to automatically synchronize your replica (backup) store or not, you may also wish to schedule some occasional auditing or even a full "refresh" of your backup.

  • Auditing for site differences: Running an audit will check if there are differences between DSpace Content and AIP backup content. A full site AIP audit can be run from the command line, or scheduled via a cron job (or similar). For example, the following command will run a site-wide audit for a DSpace site with a handle.prefix of "10673" (and writes the results of the audit to a "siteaudit.log" file).

    Code Block
    [dspace]/bin/dspace curate -t auditaip -i 10673/0 -r - > siteaudit.log
    • This command runs the "auditaip" task on your entire site (The identifier "[handle-prefix]/0" refers to the entire DSpace Site...in this case we used the example handle-prefix of "10673")
    • The "-r -" pare of this command ensures that the results of the audit are reported back to you on the command line, rather than being logged to the dspace.log file. 
    • Then the "> siteaudit.log" takes those reported results and writes them to a "siteaudit.log" file.
  • A refresh of your full backup: If you decided not to synchronize your backup (as described above), you will want to ensure that you are re-running your entire site replication on a scheduled basis.  For example, the following command will regenerate & transmit AIPs for every object in a DSpace site with a handle.prefix of "10673" (and writes the results of the audit to a "sitebackup.log" file).  It's nearly identical to the "auditaip" command above, except that you are executing "transmitaip" and writing to a different log file.

    Code Block
    [dspace]/bin/dspace curate -t transmitaip -i 10673/0 -r - > sitebackup.log

Additional Options

Configuring usage of Checkm manifest validation

...