Workflows for DDP/Bridge interactions in Specification Flow Diagrams - Version 1, tailored to Chronopolis specifically.
Terminology/Assumptions:
- File: A bitstream
- Preservation Package: A set of related bitstreams (metadata or binaries) which share a unique identifier to be handed off to a distributed digital preservation repository
- is the unique identifier still a thing? does this relate to filegroups?
- Stage | Staging Filesystem: A filesystem used to stage data temporarily. This is coupled with a Time To Live (TTL) which tells the user how long they have to retrieve the data.
- ObjectId | FilegroupId: An id associated with a group of files which is expected to persist for the lifetime of the repository and the DDP. Can be used by a repository to reconstitute itself.
Needs:
- Delineation between ephemeral and persistent identifiers
- ephemeral: deposit-id, delete-id, restore-id
- persistent: filegroup-id
Send Content
- Query OTM Bridge for deposits which Chronopolis needs to preserve
- For a given deposit: Create a package consistent with other Chronopolis packages
- This could be BagIt, OCFL, etc
- Use the FileGroupId/ObjectId for the root name of the preservation package
- Perform other Chronopolis tasks for the package
- Generate ACE Tokens for a preservation package
- Could output logging information
- Notify Chronopolis Ingest that a preservation package is ready to be ingested
- Wait for notification that the preservation package has been successfully preserved
- Update OTM Bridge with notification that the deposit has been preserved
- Needs to be specced out
- Needs to be specced out
Questions
- Should the different stages generate audit events? i.e. creating the package, generating tokens, notification of chronopolis, etc
- In 2, what assumptions can be made about the data? Has the bridge:
- done any verification of the hashes for the staged data
- what can be said about duplication of data? anything or nothing at all?
Delete Content
- Query OTM Bridge for deletions
- OTM Bridge API Specification#ListDeletes
- Should be able to identify file based on the requesting OTM Bridge user
- OTM Bridge API Specification#ListDeletes
- Notify Chronopolis staff about removal of the preservation package
- Create tickets for removing the package
- Chron staff removes packages through our deprecation process
- Could be automated; might want some verification before pushing the deletion through the system
- Update the status of the delete to the OTM Bridge
- Need to spec this out
If a file is being removedIdentify where the file is locatedRemove the file by...removing it from its filegroup/objectOR generate a new package without said file
Questions
- Similar questions about how much we know about the request and if there's any extra validation the DDP should do
- At the very least do we know if files/objects/filegroups exist?
- I believe the discussion so far has centered around removing an entire file group (identified by an ObjectId) from the bridge – is this true?
- Discussion about expectations of deletion from the system
- Should any information remain about the Object?
- Audit events
- Fixity information
- Should any information remain about the Object?
Retrieve Content
- Query OTM Bridge for Restores to be processed
- Identify space for the restore to be staged on
- Needs to be accessible by the OTM Gateway
- What if there is insufficient space available?
- Needs to guarantee that space will be available while restoring
- Restaging in Chronopolis
- Current process
- A read only mount is available which contains the preservation storage
- Symbolic links are created from the ro mount to the DuracloudVault restore area
- For OTM Bridge with RO mount
- Could perform a similar process and create symlinks
- Quick, but don't want to make guarantees about that mount being available (might be an object store in the future)
- For OTM Bridge without RO mount
- Contact Chronopolis and request the content be staged
- Could be re-staged through rsync, http, etc. Flexible.
- The process which handled the deposit could potentially handle content retrieval as well
- Current process
- Notify the OTM Bridge that the Restore is staged and is accessible for a given TTL
- Upon expiration of the TTL, remove the staged content
- When does the status get updated in the Bridge?
- Or does the restore cease to exist?
- Can the OTM Bridge be polled for Restores passed their TTL?
Question
- How to handle errors for insufficient space
- If individual files are requested, does the bridge handle that?
- Restaging an entire Object could take time, might want the DDP to pull some of its own weight here as well.
- Restaging large files which are not requested is also wasteful of staging space
- Many options available for returning content, possibly even proxying data
- Is a Restore ephemeral in the OTM Bridge?