Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the end, it sounds like our best option may be to ask the DuraCloud Team to create a way to turn off the "auto-delete" option of the DuraCloud Sync Tool. Bill Branan seemed open to this idea.

Here's a basic example future DSpace/DuraCloud interaction workflow, that may meet most of the initial needs of a DSpace implementationallow us to perform a "trickle" synchronization of content:

  1. Create a Folder, and turn on DuraCloud Sync Tool and point at that Folder. Make sure the "auto-delete" option of the Sync Tool is turned OFF.
  2. Export the first 10GB of AIP content from DSpace into that Folder. DuraCloud will automatically notice the new files and sync them up into the cloud storage.
  3. Next, remove the already synced 10GB of content from the Folder (if "auto-delete" option is turned OFF, DuraCloud will retain that content in cloud storage).
  4. Export the next 10GB of AIP content from DSpace into that Folder. DuraCloud will automatically notice the new files and sync them up into the cloud storage. This will bring the total storage in DuraCloud up to 20GB (even though your local sync folder only has 10GB)
  5. Repeat in batches of 10GB until all of DSpace content AIPs are loaded into DuraCloud

...

  1. DuraCloud will never remove any content, until you explicitly tell it to. This means we may need a "cleanup" script or have an "audit" script which can clean up unnecessary files that still exist in DuraCloud that were removed from DSpace.
  2. DuraCloud will accept updates to files. If you place a file into your sync folder with the name "ITEM-123454678-1.zip", and DuraCloud already has a file of that name in its storage space, DuraCloud will compare the files (via checksum). If they are different, the new file will overwrite the old file.

...

Auditing Functionality

Once content is in DuraCloud, we need a way to audit that content and compare it to what is currently in DSpace.

A very simple DuraCloud/DSpace auditing workflow may be as follows:

  1. Export an AIP for a random DSpace Object (or a chosen one) to local filesystem
  2. Generate a local checksum of the exported AIP
  3. Using the DuraCloud REST API, compare that local checksum with the checksum for that item as stored in DuraCloud
    • If the checksums match, then the content is identical (successful audit)
    • If they don't match, then you know one or the other is out of sync