You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Old version of this page: BagIt Specification OLD

This page details the specification of DPN Content Packages.

DPN BagIt Bags (Content Packages)

BagIt is a hierarchical file packaging format designed to support disk-based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" (the arbitrary content) and "tags", which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding secure hash /message digest (checksum).

Specification

  1. DPN packages will conform to the BagIt packaging format (spec)
  2. DPN packages may either be 
    1. serialized (e.g. a single tar)
    2. un-serialized (e.g. exploded directory structure)

DPN Bagit Structure

DPN requires packaging content for transport, but does not specify how each replicating node incorporates digital objects into their respective repositories.

As an example SDR will serialize content in the /data directory in tar format, however, this may not be feasible for some repositories as the size of the serialized bag may be too large. We may want to consider the method of bag grouping, i.e. a bag directory that holds many serialized bags, all with the same profile and a sequence number associated with component bag. DPN will not support holey bags, i.e. not use fetch.txt.

DPN Bag Structure

<DPN-Object-ID>/
         |   bagit.txt
         |   manifest-sha256.txt
         |   bag-info.txt
         |   tagmanifest-sha256.txt
         \--- data/
               |   [payload files]
         \--- dpn-tags/
               |   dpn-info.txt
         \--- [optional node tag directories]/
               |   [optional node tag files]

Description

DPN-Object-ID (directory)
  • Name of the root directory of the bag required by bagit spec
  • The Unique DPN UUID of the objects, same as the dpn-info.txt: DPNObject-ID value
bag-it.txt
  • As listed in required element in BagIt spec.
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
manifest-sha256.txt
  • Required element from bagit spec. SHA256 is the DPN standard for fixity checksums (secure hash/message digest)
  • Contains payload (content under /data) and associated checksums (secure hash(s))
  • Note: As we update our fixity algorithms in the future we should keep old versions of this manifest file for auditing and historic purposes
    • There is some debate on the retention of legacy manifest-<alg>.txt files
bag-info.txt
  • bagit spec section 2.2.2
  • Using this to add additional information to help with succession
  • Fields that may have been redundant with local dpn-info.txt fields are recommended to be kept in dpn-info.txt to avoid confusion
  • DPN requires the presence of the following fields, although they may be empty.  Please note that the values of "null" and/or "nil" should not be used.  The colon (:) should still be present. Note, after further discussion, we determined that "empty" fields are consistent with both "human" readability and with current bagit community best practices. 08/20/2014 
   Source-Organization
   Organization-Address
   Contact-Name
   Contact-Phone
   Contact-Email
   Bagging-Date
   Bag-Size
   Bag-Group-Identifier
   Bag-Count
  • "Bag-Group-Identifer" and "Bag-Count" are currently included, but may be removed if their actual use does not come into practice
  • Other fields are optional for use by the First Node but are ignored by all common DPN processes.
tagmanifest-sha256.txt
  • bagit spec section 2.2.1
  • Contains secure hash of tag files
  • This will ensure the metadata we are storing with the bag is preserved
  • As with manifest-sha256.txt we can keep old versions as we update our fixity algorithms
fetch.txt
  • Not supported DPN as we do not support Holey-bags.
data (directory)
  • Required directory for payload items
  • May be encrypted for dark content.
dpn-tags (directory)
  • Directory for DPN specific tag files (covered under optional tag directories of the bagit spec section 2)
  • All DPN tag files go under this directory with the naming convention ‘dpn-<filename>.txt’ following bagit text tag file specifications
dpn-tags/dpn-info.txt
  • DPN tag file containing field set below:
DPN-Object-ID: Unique ID generated by Ingest Node. 
Local-ID:  Local identifier from originating repository.
Ingest-Node-Name:  Name of the ingest node or source repository
Ingest-Node-Address:
Ingest-Node-Contact-Name:
Ingest-Node-Contact-Email:
Version-Number: Sequential positive integer
First-Version-Object-ID: Object-ID of the first version of the item
Interpretive-Object-ID: DPN UUID of Interpretive bag for this object
Rights-Object-ID: Reference to DPN and repository agreements
Bag-Type: data | interpretive | rights # Bags will be only one of these three types of objects.
  • The naming convention of fields that hold DPN UUIDs have the suffix "Object-ID"
    • Alternative naming conventions to also be considered include: "OID", "DPN-ID", "DID", "Reference-ID", "Ref-ID", etc
  • Every field must appear.  If a field does not have a value, it should still appear but be left blank.  
  • All fields must have a value, except for:
    • Previous-Version-Object-ID
    • Brightening-Object-ID ("rights" and "brightening" only)*
    • Rights-Object-ID ("rights" and "brightening" only)*
    • * Currently, "data" bags also do not require this field.
  • Fields that could contain more than one value should be repeated for each value.  Do not separate with commas
    • Currently, the only fields that may be repeated are "Brightening-Object-ID" and "Rights-Object-ID".]
    • Example:
Brightening-Object-ID: UUID #1
Brightening-Object-ID: UUID #2
Brightening-Object-ID: UUID #3
optional node tag directory and files
  • Following the bagit specification for optional tag directories and using the convention for DPN optional tags, first nodes MAY choose to include optional tags of their own which will be ignored by the DPN Federation as a whole
  • As with the convention used with the DPN optional tags we recommend the directory naming convention of `<node name>-tags` and file naming convention of `<node name>-<filename>.txt` following the bagit specification
  • First nodes that use this directory should strongly consider having a node-specific BagIt profile

 

DPN Bag Transfer Protocol

  1. DPN will transfer valid DPN bags that have been 'tar'red. I.e. serialized bags.
  2. Upon finishing the transfer of a bag-tar file - the replicating node will compute the SHA256 hash of the serialized file. This is the hash that will be sent to the first-node and shows that the tarred bag was transferred without errors.
  3. The SHA256 hash of the bag's tagmanifest-sha256.txt file will be calculated by the originating node, used as the fixity_value for the bag, and kept in the registry.

 

  • No labels