Restoration

This page will serve to map out a general process for restoring/repairing content between nodes in Chronopolis, and denote areas where discussion/development is needed.

We operate based on a few principles:

The initial trigger is manual
We don't expose our preservation storage to any public facing service

Basic Flow

Faulty Node notices it has a errant collection in ACE
Faulty Node asks the Ingest Server for a copy of the collection
1. Could ask for a partial set of the collection
  1. {..., 'files': ['file_1', 'file_2', ..., 'file_n']}
2. Could ask for the whole collection
3. Could ask for ACE Tokens (maybe part of a different flow)
The Ingest Server picks a node to restore from
1. Could query ACE to choose based off of the most recently audited
Restoring Node sees the request to restore from its content
1. Can stage content locally or remotely.
Restoring Node notifies the Ingest Server upon completion
1. If we stage content locally, a URI needs to be sent back in order to rsync
The Ingest Server creates a request for Node 1 to pick up the data
Faulty Node rsyncs the data from the Ingest Server/Restoring Node
1. Also triggers an audit of corrupt files in ACE
Faulty Node notifies the Ingest Server of completion

Discussion Items

Restore Request

json example

How to Restore

Local restore pros/cons

Remote (push to ingest) restore pros/cons

Dev Work

While there's currently an API controller for /restorations and a table in the database for them, neither is used in production. We'll likely want to alter the restoration table to suit our needs, as well as update the controller to figure out how to handle requests/updates to the restore object. Most of this is preliminary as the data structures and flow still need to be fleshed out.

On a module basis:

ingest-rest

Add a database migration for the changes to the restore table
Update RestoreController to match our new flow
- Can be broken down into smaller components when dev. actually gets started
- Includes logic for choosing nodes
- Add query parameters when getting restorations (so we can easily tell if we are helping a node restore or pull it ourselves)
Add model for the initial request for restoring content

replication-shell

Add a scheduled task for checking for replications which we are
- the restoring node
- the faulty node
Add logic to handle staging of content
- After we choose if we will do local/remote restorations

rest-common

Make changes to the Restoration class (tbd while we determine what to change)

Space shortcuts

Page tree