You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Workflows for DDP/Bridge interactions in Specification Flow Diagrams, tailored to Chronopolis specifically.

Terminology/assumptions:

  • File: A bitstream
  • Preservation Package: A set of related bitstreams (metadata or binaries) which share a unique identifier to be handed off to a distributed digital preservation repository
    • is the unique identifier still a thing? does this relate to filegroups?
  • Stage | Staging Filesystem: A filesystem used to stage data temporarily. This is coupled with a Time To Live (TTL) which tells the user how long they have to retrieve the data.

Needs:

  • Understanding about groupings of files and what they are to be called. Currently have a some synonymous terms floating around - Object and FileGroup.
    • Need to know what is expected to persist (objectid/filegroup-id), and if it's ok to use that as a way to "organize" content coming in
  • Delineation between ephemeral and persistent identifiers
    • ephemeral: deposit-id, delete-id, restore-id
    • persistent: filegroup-id

Send Content

  1. Query OTM Bridge for deposits which Chronopolis needs to preserve
    1. OTM Bridge API Specification#ListDeposits
  2. For a given deposit: Create a package consistent with other Chronopolis packages
    1. This could be BagIt, OCFL, etc
    2. Likely will use the DepositId/FileGroupId/ObjectId for the root name of the package
    3. Package per Object/Filegroup? Need better understanding of how we are defining groupings of files.
    4. If groupings not used, up to the DDP to determine how files are stored. In the case of Chronopolis, we would likely subdivide arbitrarily.
  3. Perform other Chronopolis tasks for the package
    1. Generate ACE Tokens for a preservation package
    2. Could output logging information
  4. Notify Chronopolis Ingest that a preservation package is ready to be ingested
  5. Wait for notification that the preservation package has been successfully preserved
  6. Update OTM Bridge with notification that the deposit has been preserved
    1. Needs to be specced out


Questions

  • Should the different stages generate audit events? i.e. creating the package, generating tokens, notification of chronopolis, etc
  • In 2, what assumptions can be made about the data? Has the bridge:
    • done any verification of the hashes for the staged data
    • what can be said about duplication of data? anything or nothing at all?

Delete Content

  1. Query OTM Bridge for deletions
    1. OTM Bridge API Specification#ListDeletes
    2. Should be able to identify file based on the requesting OTM Bridge user
  2. If a preservation package is being removed
    1. Notify Chronopolis staff about removal
    2. Create tickets for removing the package
    3. Chron staff removes packages through our deprecation process
    4. Could be automated; might want some verification before pushing the deprecation through the system
  3. If a file is being removed
    1. Identify where the file is located
    2. Remove the file by...
      1. removing it from its filegroup/object
      2. OR generate a new package without said file
  4. Update the status of the delete to the OTM Bridge
    1. Need to spec this out

Questions

  • Similar questions about how much we know about the request and if there's any extra validation the DDP should do
    • At the very least do we know if files/objects/filegroups exist?
  • If we opt for removal of files in an individual object, then we run into an issue of needing versioning
    • so far we have avoided talking about this, but might be a good time

Retrieve Content

  1. Query OTM Bridge for Restores to be processed
    1. OTM Bridge API Specification#ListRestores-
  2. Identify space for the restore to be staged on
    1. Needs to be accessible by the OTM Gateway
    2. What if there is insufficient space available?
    3. Needs to guarantee that space will be available while restoring
  3. Restaging in Chronopolis
    1. Current process
      1. A read only mount is available which contains the preservation storage
      2. Symbolic links are created from the ro mount to the DuracloudVault restore area
    2. For OTM Bridge with RO mount
      1. Could perform a similar process and create symlinks
      2. Quick, but don't want to make guarantees about that mount being available (might be an object store in the future)
    3. For OTM Bridge without RO mount
      1. Contact Chronopolis and request the content be staged
      2. Could be re-staged through rsync, http, etc. Flexible.
    4. The process which handled the deposit could potentially handle content retrieval as well
  4. Notify the OTM Bridge that the Restore is staged and is accessible for a given TTL
  5. Upon expiration of the TTL, remove the staged content
    1. When does the status get updated in the Bridge? 
    2. Or does the restore cease to exist?
    3. Can the OTM Bridge be polled for Restores passed their TTL?


Question

Is there a limit on the amount of data that can be restaged at once?


  • No labels