BagIt is a specification for packaging/storing content for preservation and transfer. For Chronopolis, we use the bagit specification as a base, only updating certain files to be required.
Structure
Bag Structure <Collection-Name>/ | bagit.txt | manifest-sha256.txt | bag-info.txt | tagmanifest-sha256.txt \--- data/ | [payload files]
Currently we require sha256 for Chronopolis, so all manifests are appended with "-sha256"
Data Files
The data files (or, the payload of a bag) in the BagIt spec are all the files found under the data directory. If there are orphaned files (those not found in the manifest, but exist in the data directory), a bag is deemed to be invalid.
Tag Files
Tag files are all those in that exist outside of the data directory. The bagit.txt and bag-info.txt files are standard and we follow the same requirements as in the bagit specification for them.
manifest-sha256.txt
- Required as part of the bagit spec
- Contains the digest of a file with the relative path from the bags root directory
- Uses the same output as the md5sum/sha256sum command line utilities
tagmanifest-sha256.txt
- We require the existence of a tagmanifest, which contains the digest of each tag file and the manifest.
- This lets us validate not only the tag files but also the manifest we were given.
- When doing bagging ourselves, we create this only if each file in the manifest has been validated
Optional Tag Files
Duraspace:
- manifest-md5.txt
- content-properties.json
- .collection-snapshot.properties
DPN:
- dpn-tags/dpn-info.txt
- Possible other node related tag files
Each file is digested and added to the tagmanifest. DPN has its own constraints on the dpn-info.txt file which we conform to.