Workflows for DDP/Bridge interactions in Specification Flow Diagrams, tailored to Chronopolis specifically.
Terminology/assumptions:
- File: A bitstream
- Preservation Package: A set of related bitstreams (metadata or binaries) which share a unique identifier to be handed off to a distributed digital preservation repository
- is the unique identifier still a thing? does this relate to filegroups?
- Stage | Staging Filesystem: A filesystem used to stage data temporarily. This is coupled with a Time To Live (TTL) which tells the user how long they have to retrieve the data.
Needs:
- Understanding about groupings of files and what they are to be called. Currently have a some synonymous terms floating around - Object and FileGroup.
- Need to know what is expected to persist (objectid/filegroup-id), and if it's ok to use that as a way to "organize" content coming in
- Delineation between ephemeral and persistent identifiers
- ephemeral: deposit-id, delete-id, restore-id
- persistent: filegroup-id
Send Content
- Query OTM Bridge for deposits which Chronopolis needs to preserve
- For a given deposit: Create a package consistent with other Chronopolis packages
- This could be BagIt, OCFL, etc
- Likely will use the DepositId/FileGroupId/ObjectId for the root name of the package
- Package per Object/Filegroup? Need better understanding of how we are defining groupings of files.
- If groupings not used, up to the DDP to determine how files are stored. In the case of Chronopolis, we would likely subdivide arbitrarily.
- Perform other Chronopolis tasks for the package
- Generate ACE Tokens for a preservation package
- Could output logging information
- Notify Chronopolis Ingest that a preservation package is ready to be ingested
- Wait for notification that the preservation package has been successfully preserved
- Update OTM Bridge with notification that the deposit has been preserved
- Needs to be specced out
- Needs to be specced out
Questions
- Should the different stages generate audit events? i.e. creating the package, generating tokens, notification of chronopolis, etc
- In 2, what assumptions can be made about the data? Has the bridge:
- done any verification of the hashes for the staged data
- what can be said about duplication of data? anything or nothing at all?
Delete Content
- Query OTM Bridge for deletions
- OTM Bridge API Specification#ListDeletes
- Should be able to identify file based on the requesting OTM Bridge user
- OTM Bridge API Specification#ListDeletes
- If a preservation package is being removed
- Notify Chronopolis staff about removal
- Create tickets for removing the package
- Chron staff removes packages through our deprecation process
- Could be automated; might want some verification before pushing the deprecation through the system
- If a file is being removed
- Identify where the file is located
- Remove the file by...
- removing it from its filegroup/object
- OR generate a new package without said file
- removing it from its filegroup/object
- Identify where the file is located
- Update the status of the delete to the OTM Bridge
- Need to spec this out
Questions
- Similar questions about how much we know about the request and if there's any extra validation the DDP should do
- At the very least do we know if files/objects/filegroups exist?
- If we opt for removal of files in an individual object, then we run into an issue of needing versioning
- so far we have avoided talking about this, but might be a good time
Retrieve Content
- Query OTM Bridge for Restores to be processed
- Identify space for the restore to be staged on
- Needs to be accessible by the OTM Gateway
- What if there is insufficient space available?
- Needs to guarantee that space will be available while restoring
- Restaging in Chronopolis
- Current process
- A read only mount is available which contains the preservation storage
- Symbolic links are created from the ro mount to the DuracloudVault restore area
- For OTM Bridge with RO mount
- Could perform a similar process and create symlinks
- Quick, but don't want to make guarantees about that mount being available (might be an object store in the future)
- For OTM Bridge without RO mount
- Contact Chronopolis and request the content be staged
- Could be re-staged through rsync, http, etc. Flexible.
- The process which handled the deposit could potentially handle content retrieval as well
- Current process
- Notify the OTM Bridge that the Restore is staged and is accessible for a given TTL
- Upon expiration of the TTL, remove the staged content
- When does the status get updated in the Bridge?
- Or does the restore cease to exist?
- Can the OTM Bridge be polled for Restores passed their TTL?
Question
Is there a limit on the amount of data that can be restaged at once?