You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This page will serve to map out a general process for restoring/repairing content between nodes in Chronopolis, and denote areas where discussion/development is needed.

We operate based on a few principles:

  • The initial trigger is manual
  • We don't expose our preservation storage to any public facing service

Basic Flow

  1. Faulty Node notices it has a errant collection in ACE
  2. Faulty Node asks the Ingest Server for a copy of the collection
    1. Could ask for a partial set of the collection 
      1. {..., 'files': ['file_1', 'file_2', ..., 'file_n']}
    2. Could ask for the whole collection
    3. Could ask for ACE Tokens (maybe part of a different flow)
  3. The Ingest Server picks a node to restore from
    1. Could query ACE to choose based off of the most recently audited
  4. Restoring Node sees the request to restore from its content
    1. Can stage content locally or remotely.
  5. Restoring Node notifies the Ingest Server upon completion
    1. If we stage content locally, a URI needs to be sent back in order to rsync
  6. The Ingest Server creates a request for Node 1 to pick up the data
  7. Faulty Node rsyncs the data from the Ingest Server/Restoring Node
    1. Also triggers an audit of corrupt files in ACE
  8. Faulty Node notifies the Ingest Server of completion

Discussion Items

Restore Request
json example
How to Restore

Local restore pros/cons

Remote (push to ingest) restore pros/cons

Dev Work

While there's currently an API controller for /restorations and a table in the database for them, neither is used in production. We'll likely want to alter the restoration table to suit our needs, as well as update the controller to figure out how to handle requests/updates to the restore object. Most of this is preliminary as the data structures and flow still need to be fleshed out.

On a module basis:

ingest-rest
  • Add a database migration for the changes to the restore table
  • Update RestoreController to match our new flow
    • Can be broken down into smaller components when dev. actually gets started
    • Includes logic for choosing nodes
    • Add query parameters when getting restorations (so we can easily tell if we are helping a node restore or pull it ourselves)
  • Add model for the initial request for restoring content
replication-shell
  • Add a scheduled task for checking for replications which we are
    • the restoring node
    • the faulty node
  • Add logic to handle staging of content
    • After we choose if we will do local/remote restorations
rest-common
  • Make changes to the Restoration class (tbd while we determine what to change)
  • No labels