As the software which interacts with the Packager, the Intake services need to be updated to apply the new file layout to any incoming data in order to support new functionality.
Tokenization DECISION
As one of the processes which creates ACE Tokens for our collections, any changes to how we deal with when we create tokens and over what files would occur in intake. This is a good time to think about important questions regarding our current practices: what files do we need to create tokens for and is it wasteful to create tokens for packaging metadata?
In general the payload files are what we care the most about being synchronized between files, as the metadata for the file layout is used for verification on transfer, not for long term preservation. Therefore we might want to consider a workflow where we only create ACE Tokens for the payload files of a package, and let each replication service validate the metadata.
Additionally, if the ACE Tokens are embedded in the collection there is a question of if we need to store them in the Ingest database. Currently they are stored in the database so that they can be distributed to each Chronopolis Node.
OCFL Packaging
The workflows defined will be specifically for the OTM Bridge but are general enough that they will need to be applied to other Chronopolis Intake workflows
New Packages
The workflow for creating a new OCFL Object should be very similar to the current flow, with the exception that we are now applying a different packaging format. There are additional steps which may be taken depending on what we decide is included.
Workflow
- Query OTM Bridge service for a deposit waiting to be ingested
- Query Ingest to check for existence of the OCFL Object
- For this workflow, assume we see that it does not exist
- Use OCFL Packager to create a new OCFL Storage Root and OCFL Object
- Handles creation of the namaste,
inventory.json
, version directory, etc - Handles moving files into the
content
directory - Checks
fixity
for files during movement of data
- Handles creation of the namaste,
- Create optional files
- Generate ACE Tokens and put them in a Token Store
- Generate logging information if needed
- Finalize the OCFL Object
- Packager generates
inventory.json.{alg}
- Packager generates
- Register the OCFL Object with the Ingest Server
- Load Files
- Load Fixity
- Load ACE Tokens
- If these are embedded in the package, is it still needed?
- When the OCFL Object is marked as
PRESERVED
- Create Audit Events for Ingest and Replication
Applying Versions
Depending on how much work we want to do, the workflow for applying a new payload to a package has additional steps which must be taken. In addition, this process requires more communication as data will need to be re-staged so that it can be modified.
Workflow
- Query Bridge service for a preservation package waiting to be ingested
- Query Ingest to check for existence of the OCFL Object
- For this workflow, we assume that the Ingest tells us that a package does exists
- Request staging of the OCFL Object
- Ideally should go through the same Chronopolis node that Ingested the package to begin with
- Could be a workflow of its own, potentially through a staging service
- We should aim to stage the minimum amount of data needed
- The payload of the OCFL Object should not be necessary as we aren't computing differences on files and have access to the manifest through the
inventory.json
- The payload of the OCFL Object should not be necessary as we aren't computing differences on files and have access to the manifest through the
- Modify the preservation OCFL Object
- Validate fixity
- Add new version timestamp, number, files
- Move the payload files in to place
- Update package metadata for versioning
- Create optional files
- Generate ACE Tokens for new files
- Generate logging information if needed
- Generate ACE Tokens for new files
- Finalize the OCFL Object
- Regenerate the
inventory.json.{alg}
- Regenerate the
- Update the OCFL Object in the Ingest Server
- Create new version of the Collection
- Registers new fixity for files
- Registers new tokens for files
- Files which do not change are not re-distributed
- Create new version of the Collection
- When the OCFL Object is marked as
PRESERVED
- Create Audit Events for Ingest and Replication
Deleting Versions
Workflow Decisions DECISION
Before implementing a workflow for deleting versions and version data from a collection, we first need to decide how that will occur and what implications that has on the system.
Repackaging Considerations
In OCFL, when removing a file it is recommended to "create a new object that excludes the offending file, with a revised version history taking this into account." (OCFL Implementation Note - File Purging). We should abide by this if possible, in order to keep our packages consistent with the OCFL best practices. When creating a new package, there is the option to either remove the file entirely, or provide a stub
which replaces the file but keeps the identifier.
It is conceivable that repackaging can be done solely on the inventory.json
and inventory.json.{alg}
files, as well as overriding the of the deleted file so that a stub
can be transferred throughout the Chronopolis network.
Distributed Repackaged Collections
Once a collection is repacked, it will need to be redistributed throughout Chronopolis. As the changes to the OCFL Object will modify every inventory.json
from when the deleted file was introduced, we will want to look at the best way to distribute these changes throughout the system.
Overwrites
By overwriting files, we can transfer less overall data to the nodes in Chronopolis. However, we would need to make modifications to each inventory.json
and inventory.json.{alg}
in the Ingest Server as well as each ACE AM.
Deprecation and Redistribute
This is similar to the current Chronopolis workflow in which collections are marked as DEPRECATED
and removed from the ACE AM instances at each site in Chronopolis. By doing this, we can re-ingest a collection and distribute it as one whole operation. This would need to ensure that all versions of the collection are still available in the Ingest Server, and that replications are able to grab the entire OCFL Object.