You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

todo:

  • Focus on OCFL
  • Revise updated workflows
  • Review ACE-AM Updates + trim if unnecessary
  • s/package_layout/file_layout

Replication

As Replication interacts minimally with the packages distributed through the network, it should not require as much updates as other services. We should be aware of what files should be validated for a given packaging format, and run checks accordingly.

Post-Transfer Processes

After a transfer completes, we will need to know additional information about how a collection is packaged in order to validate and complete the replication.

Validation

For BagIt packages, we will likely want to do additional validation on the manifest and tagmanifest to make sure that we have the amount of files which we expect. In an OCFL package, we might do the same with the inventory contents for either a version or the state block holding the manifest of the entire package.

Hashing

During replication, certain files are hashed in order to validate successful transfer of content with the Ingest Server. So far we consider the files which are important to be the top most level of hashes: tagmanifest-sha256.txt for BagIt and inventory.json.sha512 for OCFL. In addition, we hash the ACE Token Store to ensure that it transferred successfully. 

We will likely want to store information about our “validation” files in an enum for each Package Layout. As we will not expect any variation within the location of these files until we deal with versioning, this should be a good way to avoid magic values within the classes which are doing work.

ACE Token Upload

If the ACE Tokens are transferred as part of the preservation package instead of as a separate file, the location used when retrieving them will need to be updated. When uploading files, the ByteStream of a file is retrieved from the Bucket class by passing the relative path of the TokenStore. This implementation is done so that if we need to use a non-posix storage layer, we can adapt without changing too much code (hopefully). 

In order to support ACE Tokens stored within a package, we would only need to update the path passed to the Bucket. This might be information we store internally about where an ACE TokenStore lives depending on what type of package (OCFL, BagIt) is using. In addition, if any Content Addressing is used, we would first need to read the index to discover the path of the TokenStore.

ACE Audit Manager

The ACE Audit Manager provides utilities for handling most of the actions we want to take on files when creating updates, however it does not have any idea of versioning built in. This means that if a file changes on disk, ACE will log that the file is corrupt until its original checksum matches what is has stored. In addition, since some files will need to have tokens updated, ACE will mark the token as being corrupt until the checksum of the file is updated. While this primarily will affect metadata of packages, we will still want to take steps to resolve these issues.

Updating Files

The ACE AM will need to have an API added in order to allow the mutation of file checksums stored for a given file. This will need additional notifications in the ACE AMs event log showing the FILE_CHECKSUM_UPDATE event along with the old checksum as well as the new. Updating the checksum for a file may also need to update the state of the MonitoredItem to INVAILD, which would happen when the new ACE Token is ingested as well.

Content Address Storage Driver

Depending on how much we wish to understand about a package and the mappings of its files, we could provide a StorageDriver for the ACE AM which allows the display of a files logic address. This would go along with the Content Addressing packaging which has layouts such as idx_v1.idx, which could be parsed by ACE AM and applied when browsing collections. This isn’t strictly necessary and would serve to add a human-readable element when looking at the MonitoredItems of a Collection. In all likeliness it would probably require a database migration in order to capture the mappings in an efficient way.

Versioning for MonitoredItems

One course of action which is optional would be to give the ACE Audit Manager some knowledge of versions in its database. This would require a migration in the form of a ace-{version}.sql file, and similar fields to the migrations outlined in the Ingest section. The versioned information which ACE AM would need to track are the file’s digest and token. Additional events should be generated, such as FILE_VERSION_UPDATE which are persisted to the logevent table.

  • No labels