Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The official DPN BagIt specification has been moved.

Please refer to: https://docs.google.com/document/d/1JqKMFn9KfeIMAAEdOGQr6LZPqNWx8Qubi12uoUXi2QU/edit#

Table of Contents

This page details the specification of DPN Content Packages.

DPN BagIt Bags (Content Packages)

BagIt is a hierarchical file packaging format designed to support disk-based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" (the arbitrary content) and "tags", which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding secure hash /message digest (checksum).

Specification

  1. DPN packages will conform to the BagIt packaging format (spec)
  2. DPN packages may either be 
    1. serialized (e.g. a single tar)
    2. un-serialized (e.g. exploded directory structure)

DPN Bagit Structure

DPN requires packaging content for transport, but does not specify how each replicating node incorporates digital objects into their respective repositories.

DPN Bag Structure

No Format
<DPN-Object-ID>/
         |   bagit.txt
         |   manifest-sha256.txt
         |   bag-info.txt
         |   tagmanifest-sha256.txt
         \--- data/
               |   [payload files]
         \--- dpn-tags/
               |   dpn-info.txt
         \--- [optional tag directories]/
               |   [optional node tag files]

Description

DPN-Object-ID (directory)
  • Name of the root directory of the bag required by bagit spec
  • The Unique DPN UUID of the objects, same as the dpn-info.txt: DPNObject-ID value
bag-it.txt
  • As listed in required element in BagIt spec.
No Format
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
manifest-sha256.txt
bag-info.txt
  • bagit spec section 2.2.2
  • Using this to add additional information to help with succession
  • Fields that may have been redundant with local dpn-info.txt fields are recommended to be kept in dpn-info.txt to avoid confusion
  • DPN requires the presence of the following fields, although they may be empty.  Please note that the values of "null" and/or "nil" should not be used.  The colon (:) should still be present. 
No Format
   Source-Organization
   Organization-Address
   Contact-Name
   Contact-Phone
   Contact-Email
   Bagging-Date
   Bag-Size
   Bag-Group-Identifier
   Bag-Count
  • Other fields are optional for use by the Ingest Node but are ignored by all common DPN processes.
tagmanifest-sha256.txt
  • bagit spec section 2.2.1
  • Contains secure hash of tag files
  • This will ensure the metadata we are storing with the bag is preserved
  • All objects in the bag, including those in the optional tag directories must be represented in the tag manifest.
fetch.txt
  • Not supported DPN as we do not support Holey-bags.
data (directory)
  • Required directory for payload items
  • May be encrypted for dark content.
dpn-tags (directory)
  • Directory for DPN specific tag files (covered under optional tag directories of the bagit spec section 2)
  • All DPN tag files go under this directory with the naming convention ‘dpn-<filename>.txt’ following bagit text tag file specifications
dpn-tags/dpn-info.txt
  • DPN tag file containing field set below:
No Format
DPN-Object-ID: Unique ID generated by Ingest Node. 
Local-ID:  Local identifier from originating repository.
Ingest-Node-Name:  Name of the ingest node or source repository
Ingest-Node-Address:
Ingest-Node-Contact-Name:
Ingest-Node-Contact-Email:
Version-Number: Sequential positive integer
First-Version-Object-ID: Object-ID of the first version of the item
Interpretive-Object-ID: DPN UUID of Interpretive bag for this object
Rights-Object-ID: Reference to DPN and repository agreements
Bag-Type: data | interpretive | rights # Bags will be only one of these three types of objects.

...

  • Alternative naming conventions to also be considered include: "OID", "DPN-ID", "DID", "Reference-ID", "Ref-ID", etc

...

  • Currently, the only fields that may be repeated are "Interpretive-Object-ID" and "Rights-Object-ID".]
  • Example:
No Format
Interpretive-Object-ID: UUID #1
Interpretive-Object-ID: UUID #2
Interpretive-Object-ID: UUID #3
optional node tag directory and files

...

DPN Bag Transfer Protocol

  1. DPN will transfer valid DPN bags that have been 'tar'red. I.e. serialized bags.
  2. Upon finishing the transfer of a bag-tar file - the replicating node will compute the SHA256 hash of the serialized file. This is the hash that will be sent to the first-node and shows that the tarred bag was transferred without errors.
  3. The SHA256 hash of the bag's tagmanifest-sha256.txt file will be calculated by the originating node, used as the fixity_value for the bag, and kept in the registry.