Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

New version of this page: BagIt Specification

Table of Contents

This page details the specification of DPN Content Packages.  Bagging is to be confirmed or handled by the First Node before sending material to DPN for ingestion.  This document captures the specification used to store items in DPN.

DPN BagIt Bags (Content Packages)

BagIt is a hierarchical file packaging format designed to support disk-based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" (the arbitrary content) and "tags", which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding secure hash /message digest (checksum).

Specification

  1. DPN packages will conform to the BagIt packaging format (spec)
  2. DPN packages may either be 
    1. serialized (e.g. a single tar)
    2. un-serialized (e.g. exploded directory structure)
  3. DPN packages will conform to the TBD BagIt profile

DPN Bagit Structure

This section is an initial suggestion for the DPN BagIt structure. DPN requires packaging content for transport, but does not specify how each replicating node incorporates digital objects into their respective repositories.

As an example SDR will serialize content in the /data directory in tar format, however, this may not be feasible for some repositories as the size of the serialized bag may be too large. We may want to consider the method of bag grouping, i.e. a bag directory that holds many serialized bags, all with the same profile and a sequence number associated with component bag. DPN will not support holey bags, i.e. not use fetch.txt.

Proposed DPN Bag Structure

No Format
<DPN-Object-ID>/
         |   bagit.txt
         |   manifest-sha256.txt
         |   bag-info.txt
         |   tagmanifest-sha256.txt
         \--- data/
               |   [payload files]
         \--- dpn-tags/
               |   dpn-info.txt
               |   dpn-registry.txt
         \--- [optional node tag directories]/
               |   [optional node tag files]

Description

DPN-Object-ID (directory)
  • Name of the root directory of the bag required by bagit spec
  • The Unique DPN UUID of the objects, same as the dpn-info.txt: DPNObject-ID value
bag-it.txt
  • As listed in required element in BagIt spec.
No Format
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
manifest-sha256.txt
  • Required element from bagit spec. SHA256 is the DPN standard for fixity checksums (secure hash/message digest)
  • Contains payload (content under /data) and associated checksums (secure hash(s))
  • Note: As we update our fixity algorithms in the future we should keep old versions of this manifest file for auditing and historic purposes
    • There is some debate on the retention of legacy manifest-<alg>.txt files
bag-info.txt
  • bagit spec section 2.2.2
  • Using this to add additional information to help with succession
  • Fields that may have been redundant with local dpn-info.txt fields are recommended to be kept in dpn-info.txt to avoid confusion
  • DPN requires the presence of the following fields, although they may be 'nil'

...

  • "Bag-Group-Identifer" and "Bag-Count" are currently included, but may be removed if their actual use does not come into practice
  • Other fields are optional for use by the First Node but are ignored by all common DPN processes.
tagmanifest-sha256.txt
  • bagit spec section 2.2.1
  • Contains secure hash of tag files
  • This will ensure the metadata we are storing with the bag is preserved
  • As with manifest-sha256.txt we can keep old versions as we update our fixity algorithms
fetch.txt
  • Not supported DPN as we do not support Holey-bags.
data (directory)
  • Required directory for payload items
  • May be encrypted for dark content.
dpn-tags (directory)
  • Directory for DPN specific tag files (covered under optional tag directories of the bagit spec section 2)
  • All DPN tag files go under this directory with the naming convention ‘dpn-<filename>.txt’ following bagit text tag file specifications
dpn-tags/dpn-info.txt
  • DPN tag file containing fieldset below

...

  • Note: The naming convention of fields that hold DPN UUIDs have the suffix "Object-ID"
    • Alternative naming conventions to also be considered include: "OID", "DPN-ID", "DID", "Reference-ID", "Ref-ID", etc
optional node tag directory and files
  • Following the bagit specification for optional tag directories and using the convention for DPN optional tags, first nodes MAY choose to include optional tags of their own which will be ignored by the DPN Federation as a whole
  • As with the convention used with the DPN optional tags we recommend the directory naming convention of `<node name>-tags` and file naming convention of `<node name>-<filename>.txt` following the bagit specification
  • First nodes that use this directory should strongly consider having a node-specific BagIt profile

DPN BagIt Profile

DPN_example_profile.json