Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Workflows for DDP/Bridge interactions in Specification Flow Diagrams - Version 1, tailored to Chronopolis specifically.

Terminology/assumptionsAssumptions:

  • File: A bitstream
  • Preservation Package: A set of related bitstreams (metadata or binaries) which share a unique identifier to be handed off to a distributed digital preservation repository
    • is the unique identifier still a thing? does this relate to filegroups?
  • Stage | Staging Filesystem: A filesystem used to stage data temporarily. This is coupled with a Time To Live (TTL) which tells the user how long they have to retrieve the data.

Needs:

...

  • ObjectId | FilegroupId: An id associated with a group of files which is expected to persist for the lifetime of the repository and the DDP. Can be used by a repository to reconstitute itself.

Needs:

...

  • Delineation between ephemeral and persistent identifiers
    • ephemeral: deposit-id, delete-id, restore-id
    • persistent: filegroup-id

...

  1. Query OTM Bridge for deposits which Chronopolis needs to preserve
    1. OTM Bridge API Specification#ListDeposits
  2. For a given deposit: Create a package consistent with other Chronopolis packages
    1. This could be BagIt, OCFL, etc
    2. Likely will use Use the DepositId/ FileGroupId/ObjectId for the root name of the package
    3. Package per Object/Filegroup? Need better understanding of how we are defining groupings of files.
    4. If groupings not used, up to the DDP to determine how files are stored. In the case of Chronopolis, we would likely subdivide arbitrarily.preservation package
  3. Perform other Chronopolis tasks for the package
    1. Generate ACE Tokens for a preservation package
    2. Could output logging information
  4. Notify Chronopolis Ingest that a preservation package is ready to be ingested
  5. Wait for notification that the preservation package has been successfully preserved
  6. Update OTM Bridge with notification that the deposit has been preserved
    1. Needs to be specced out

...

  1. Query OTM Bridge for deletions
    1. OTM Bridge API Specification#ListDeletes
    2. Should be able to identify file based on the requesting OTM Bridge user
  2. If a preservation package is being removedNotify Chronopolis staff about removal of the preservation package
  3. Create tickets for removing the package
  4. Chron staff removes packages through our deprecation process
    1. Could be automated; might want some verification before pushing the deprecation deletion through the system
  5. Update the status of the delete to the OTM Bridge
    1. Need to spec this out


  1. If a file is being removed
    1. Identify where the file is located
    2. Remove the file by...
      1. removing it from its filegroup/object
      2. OR generate a new package without said file
    Update the status of the delete to the OTM Bridge
    1. Need to spec this out


Info
titleQuestions
  • Similar questions about how much we know about the request and if there's any extra validation the DDP should do
    • At the very least do we know if files/objects/filegroups exist?
  • I believe the discussion so far has centered around removing an entire file group (identified by an ObjectId) from the bridge – is this true?
  • Discussion about expectations of deletion from the system
    • Should any information remain about the Object?
      • Audit events
      • Fixity information
    If we opt for removal of files in an individual object, then we run into an issue of needing versioning
    • so far we have avoided talking about this, but might be a good time

Retrieve Content

  1. Query OTM Bridge for Restores to be processed
    1. OTM Bridge API Specification#ListRestores-
  2. Identify space for the restore to be staged on
    1. Needs to be accessible by the OTM Gateway
    2. What if there is insufficient space available?
    3. Needs to guarantee that space will be available while restoring
  3. Restaging in Chronopolis
    1. Current process
      1. A read only mount is available which contains the preservation storage
      2. Symbolic links are created from the ro mount to the DuracloudVault restore area
    2. For OTM Bridge with RO mount
      1. Could perform a similar process and create symlinks
      2. Quick, but don't want to make guarantees about that mount being available (might be an object store in the future)
    3. For OTM Bridge without RO mount
      1. Contact Chronopolis and request the content be staged
      2. Could be re-staged through rsync, http, etc. Flexible.
    4. The process which handled the deposit could potentially handle content retrieval as well
  4. Notify the OTM Bridge that the Restore is staged and is accessible for a given TTL
  5. Upon expiration of the TTL, remove the staged content
    1. When does the status get updated in the Bridge? 
    2. Or does the restore cease to exist?
    3. Can the OTM Bridge be polled for Restores passed their TTL?


Info
titleQuestion
  • How to handle errors for insufficient space
  • If individual files are requested, does the bridge handle that?
    • Restaging an entire Object could take time, might want the DDP to pull some of its own weight here as well.
Is there a limit on the amount of data that can be restaged at once
    • Restaging large files which are not requested is also wasteful of staging space
  • Many options available for returning content, possibly even proxying data
  • Is a Restore ephemeral in the OTM Bridge?