Old version of this page: BagIt Specification OLD

This page details the specification of DPN Content Packages.

DPN BagIt Bags (Content Packages)

BagIt is a hierarchical file packaging format designed to support disk-based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" (the arbitrary content) and "tags", which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding secure hash /message digest (checksum).

Specification

  1. DPN packages will conform to the BagIt packaging format (spec)
  2. DPN packages may either be 
    1. serialized (e.g. a single tar)
    2. un-serialized (e.g. exploded directory structure)

DPN Bagit Structure

DPN requires packaging content for transport, but does not specify how each replicating node incorporates digital objects into their respective repositories.

As an example SDR will serialize content in the /data directory in tar format, however, this may not be feasible for some repositories as the size of the serialized bag may be too large. We may want to consider the method of bag grouping, i.e. a bag directory that holds many serialized bags, all with the same profile and a sequence number associated with component bag. DPN will not support holey bags, i.e. not use fetch.txt.

Proposed DPN Bag Structure

<DPN-Object-ID>/
         |   bagit.txt
         |   manifest-sha256.txt
         |   bag-info.txt
         |   tagmanifest-sha256.txt
         \--- data/
               |   [payload files]
         \--- dpn-tags/
               |   dpn-info.txt
         \--- [optional node tag directories]/
               |   [optional node tag files]

Description

DPN-Object-ID (directory)
bag-it.txt
BagIt-Version: M.N
Tag-File-Character-Encoding: UTF-8
manifest-sha256.txt
bag-info.txt
   Source-Organization
   Organization-Address
   Contact-Name
   Contact-Phone
   Contact-Email
   Bagging-Date
   Bag-Size
   Bag-Group-Identifier
   Bag-Count
tagmanifest-sha256.txt
fetch.txt
data (directory)
dpn-tags (directory)
dpn-tags/dpn-info.txt
DPN-Object-ID: Unique ID generated by First Node. 
Local-ID:  Local identifier from originating repository.
Ingest-Node-Name:  Name of the ingest node or source repository
Ingest-Node-Address:
Ingest-Node-Contact-Name:
Ingest-Node-Contact-Email:
Version-Number: Sequential positive integer
First-Version-Object-ID: Object-ID of the first version of the item
Interpretive-Object-ID: DPN UUID of Interpretive bag for this object
Rights-Object-ID: Reference to DPN and repository agreements
Bag-Type: data | interpretive | rights # Bags will be only one of these three types of objects.
Brightening-Object-ID: UUID #1
Brightening-Object-ID: UUID #2
Brightening-Object-ID: UUID #3
optional node tag directory and files

 

DPN Bag Transfer Protocol

  1. DPN will transfer valid DPN bags that have been 'tar'red. I.e. serialized bags.
  2. Upon finishing the transfer of a bag-tar file - the replicating node will compute the SHA256 hash of the serialized file. This is the hash that will be sent to the first-node and shows that the tarred bag was transferred without errors.
  3. The SHA256 hash of the bag's tagmanifest-sha256.txt file will be calculated by the originating node, used as the fixity_value for the bag, and kept in the registry.