Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
titletodo
  • Update Version workflow cleanup
  • Delete Version workflow
  • Recommendation for files to tokenize
  • check for cross pollination between this and Part 3: OCFL Implementation

As the software which interacts with the Packager, the Intake services need to be updated to apply the new file layout to any incoming data in order to support new functionality.

Tokenization
Status
colourBlue
titledecision

As one of the processes which creates ACE Tokens for our collections, any changes to how we deal with when we create tokens and over what files would occur in intake. This is a good time to think about important questions regarding our current practices: what files do we need to create tokens for and is it wasteful to create tokens for packaging metadata? 

In general the payload files are what we care the most about being synchronized between files, as the metadata for the file layout is used for verification on transfer, not for long term preservation. Therefore we might want to consider a workflow where we only create ACE Tokens for the payload files of a package, and let each replication service validate the metadata.

Additionally, if the ACE Tokens are embedded in the collection there is a question of if we need to store them in the Ingest database. Currently they are stored in the database so that they can be distributed to each Chronopolis Node.

OCFL Packaging

Info

The workflows defined will be specifically for the OTM Bridge but are general enough that they will need to be applied to other Chronopolis Intake workflows

...

  1. Query OTM Bridge service for a preservation package deposit waiting to be ingested
    1. List Deposits
  2. Query Ingest to check for existence of the preservation packageOCFL Object
    1. For this workflow, assume we see that it does not exist
  3. Use OCFL Packager to create a new OCFL Storage Root and OCFL Object
    1. Handles creation of the namaste, inventory.json, version directory, etc
    2. Handles moving files into the content directory
    3. Checks fixity for files during movement of data
  4. Create optional packaging files
    1. Generate ACE Tokens and put them in a Token Store
    2. Put any application logging into the packageGenerate logging information if needed
  5. Finalize the packageOCFL Object
    1. Packager generatesinventory.json.sha512{alg}
  6. Register the OCFL Object with the Ingest Server
    1. Load Files
    2. Load Fixity Is it necessary to track fixity of payload files in the ingest server?
    3. Load ACE Tokens
      1. If these are embedded in the package, is it still needed?
  7. When the OCFL Object is marked as PRESERVED
    1. Create Audit Events for Ingest and Replication

Applying Versions

Depending on how much work we want to do, the workflow for applying a new payload to a package has additional steps which must be taken. In addition, this process requires more communication as data will need to be re-staged so that it can be modified.

...

  1. Query Bridge service for a preservation package waiting to be ingested
    1. List Deposits
  2. Query Ingest to check for existence of the preservation packageOCFL Object
    1. For this workflow, we assume that the Ingest tells us that a package does exists
  3. Request staging of the preservation package metadatathe OCFL Object
    1. Ideally should go through the same Chronopolis node that Ingested the package to begin with
    2. Could be a workflow of its own,
  4. perhaps needs
    1. potentially through a staging service
  5. or something similar

    1. We
  6. might need additional data to be staged for calculating deltas, and
    1. should aim to stage the minimum amount of data needed
      1. The payload of the OCFL Object should not be necessary as we aren't computing differences on files and have access to the manifest through the inventory.json
  7. Modify the preservation packageOCFL Object
    1. Validate fixity
    2. Add new version timestamp, number, files
    3. Ensure any additional metadata is added as required
  8. Optional: Create deltas for newest versions of a file
    1. Move the payload files in to place
    2. Update package metadata for versioning
  9. Create optional packaging files
    1. Files which are not modified do not need new tokens
    2. Generate ACE Tokens for new files
    3. Generate logging information if needed
  10. Finalize the packageOCFL Object
    1. Regenerate the
  11. tagmanifest-sha256.txt or
    1. inventory.json.
  12. sha512
  13. Apply content addressing if wanted
    1. {alg}
  14. Update the package the OCFL Object in the Ingest Server
      Register
      1. Create new
    1. versions
      1. version of
    2. files
      1. the Collection
    3. Register
        1. Registers new fixity for files
    4. Register
        1. Registers new tokens for files
      1. Files which do not change are not re-distributed

    Deleting Versions

    1. When the OCFL Object is marked as PRESERVED
      1. Create Audit Eventsfor Ingest and Replication

    Deleting Versions

    Workflow Decisions
    Status
    colourBlue
    titledecision

    Before implementing a workflow for deleting versions and version data from a collection, we first need to decide how that will occur and what implications that has on the system.

    Repackaging Considerations

    In OCFL, when removing a file it is recommended to "create a new object that excludes the offending file, with a revised version history taking this into account." (OCFL Implementation Note - File Purging). We should abide by this if possible, in order to keep our packages consistent with the OCFL best practices. When creating a new package, there is the option to either remove the file entirely, or provide a stub which replaces the file but keeps the identifier.

    It is conceivable that repackaging can be done solely on the inventory.json and inventory.json.{alg} files, as well as overriding the of the deleted file so that a stub can be transferred throughout the Chronopolis network.

    Distributed Repackaged Collections

    Once a collection is repacked, it will need to be redistributed throughout Chronopolis. As the changes to the OCFL Object will modify every inventory.json from when the deleted file was introduced, we will want to look at the best way to distribute these changes throughout the system.

    Overwrites

    By overwriting files, we can transfer less overall data to the nodes in Chronopolis. However, we would need to make modifications to each inventory.json and inventory.json.{alg} in the Ingest Server as well as each ACE AM.

    Deprecation and Redistribute

    This is similar to the current Chronopolis workflow in which collections are marked as DEPRECATED and removed from the ACE AM instances at each site in Chronopolis. By doing this, we can re-ingest a collection and distribute it as one whole operation. This would need to ensure that all versions of the collection are still available in the Ingest Server, and that replications are able to grab the entire OCFL Object.todo