Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

DPN requirements and assumptions

DPN is moving from a conceptual project to a production service in Mid-2015.

In the development of the system, legal and technical constraints have been imposed that change some assumptions, as originally conceived.

...

  • DPN is a federated preservation network of independent preservation archives

    • All archives act to preserve content, not just as remote storage.

  • DPN will have has agreements to allow for Succession of content in the event an archive can no longer perform its function as an archive.

    • This is currently envisioned as a “Quit Claim” framework

    • Based on input from the legal team, comprised from the Node institutions, we will update recommendations

    • The successor of content will take on the responsibilities of and become the Administrative Node for the failed archive

  • DPN will, as much as possible, have has independent preservation implementations at each node.

  • DPN communications will follow best practices and use mutual authentication over secure channels.

  • DPN will provide provides for replication of content from a Ingest Node to Replicating Nodes

    • DPN will provide provides recovery of lost content at any Node

    • DPN will keep core service keeps 3 copies of ingested objects

  • DPN will provide provides a services service model so that any of the Nodes can ascertain if their content is valid.

  • DPN will enable enables centralized auditing

    • Process audit at the DPN federation

    • Process audit at the Node

    • Audit trails for events

    • DPN will support supports Content Fixity audit (random, periodic)

  • DPN will provide provides status reporting, activity, logging, traffic, volume, events, etc.

    • Global (across federation)

    • At the Node level

    • Periodic Reporting

    • Significant event reporting, for example succession events, loss of content, etc.

  • DPN will support supports

    • a DPN UUID (globally unique identifier for each bag deposited)

    • a common lightweight wrapper (bag) for content transfer

    • retention of  ingested content indefinitely

      • Content may be de-accessioned upon extenuating circumstances (e.g. court order)

    • duplication of critical metadata in the 'registry' and also in the content bags

    • durable and persistent communication methods to support unreliable networks and node failure

    • a distributed model, assuming eventual consistency (CAP Theorem) for replication of content and registries

  • DPN content and services will be are distributed and federated.

  • Succession DPN supports succession and brightening of content will be difficult, expensive and time consuming

  • We must test recovery of content and not assume that the vast institutional knowledge of each repository is easily captured or represented in DPN
  • .

  • Implementations will be are de-coupled implementations and architecturally distinct, as practicable, but the communication methodology will be is shared, resilient, and redundant.

  • DPN objects will be are preserved by all Nodes and support DPN preservation functions.

  • DPN will have separate has decoupled inter-node communication channels for

    • content transfer

    • process control.

  • Communication between nodes will is not be dependent upon other nodes.

Deliverables

...

- Jan 1,

...

2016


  • There is no expectation that we’ll have signed SLA’s before July 1 launch

    • DPN will have agreements to allow for Succession of content in the event an archive can no longer perform its function as an archive.

  • SLA’s with Ingest Nodes/Administrative Node

  • SLA documents between Ingest Nodes & Depositor shared and with legal staff

    • We will require in the SLA that one of the OTHER Administrative Nodes will become the Administrative node for the failed Node.

      • Act as the restorer of content for clients of the former administrative node

      • In the event of a Succession occurrence, all replicating nodes will recognize the new successor and act in accordance with prior agreements held by the former archive

    • Replication and update will take time - We are aware of this, might need to put something in the SLA recognizing this fact

    • SLA will state that: The communication layer is shared, but the repository layer is not

  • Depositor will be able to give us stuff and we can put it into storage

  •  There will be no expectation of global (DPN Level) fixity checking at launch

    • Initial fixity checking will occur at ingest

    • Each Node will check fixity according to their local policy

  • Some cursory reporting available to Depositor

    • Notification of ingest and replication will be provided by the Ingest Node to the depositor

  • Replication of content from the Administrative Node to all Replicating Nodes

    • Make sure we know how many Replicating Nodes will be storing the content

    • Make sure we know when the Ingest Node will store the ingested content and when they won’t

    • Make sure we have a clear idea of what kind of storage is available at each

  • The Ability to recover content to the depositor will be supported via the Ingest/Administrative node

  • Maintain a registry for objects, create transfer records for replicating nodes, update status of an existing object

    • Track stored status of replicated transfer

  • Agreement on the bag size limits that can be replicated across the Nodes

    • 250 Gig bags are the uppermost limit for this release

  • Bags will be validated upon receipt by the Ingesting Node and the Replicating Node

    • Validation means:

      • We will validate all of the files are present and the checksums match the manifest

      • The structure of the bag will be validated according to the DPN specs

  • In-Person Post-Mortem/Planning for Phase II - July 16th & 17th

6-Month Post-Launch Roadmap Deliverables (

...

Jun 2016)


  • Clear idea of what kind of storage capacity is available at each Replicating Node with a framework for deciding to which nodes objects are deposited (Internal Documentation)

  • Auditing consistency of registry

  • Auditing the local storage inventory of registry

  • Fixity tracking - when replicating nodes are doing auditing - including node that performed the fixity check (as part of the provenance/history of the bag)

  • Support multiple fixity types across the federation

  • Ongoing fixity checks by each node with reporting out to DPN administration

12-Month Post-Launch Roadmap Deliverables (

...

2017)


  • Depositor Dashboard

    • DPN will provide status reporting, activity, logging, traffic, volume, events, etc.

      • Global (across federation)

      • At Node level

      • Periodic Reporting

      • Significant event reporting, for example succession events, loss of content, etc.

    • Perhaps some sort of billing information

    Bag manifests can be retrieved by depositor
    • Billing information

  • Bag Discovery & Retrieval Request Mechanism

    • a depositor wants all of their bags will satisfy some criteria

    • Determination and tracking of depositor assets