Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note that we have two nodes: CORRUPT and VALID

  1. CORRUPT node notices data in their Audit Manager (ACE-AM) is showing File Corrupt, indicating that checksums on disk have changed
    1. Discussion happens internally about who has this and can repair it
    2. SSH keys exchanged so that data transfer can occur for files which are to be repaired
  2. CORRUPT logs on to the Ingest server and selects 'Request Repair' in order to create a 'Repair Request'
    1. Inputs ACE AM credentials to query for the corrupt collection
    2. Select the Collection
    3. Select the Files to repair and the Node where they will be Repaired
  3. VALID logs onto the Ingest server and selects 'Fulfill Repair' in order to stage data for the repair
  4. At this point, both CORRUPT and VALID nodes should start the Repair service
    1. The Repair service running at VALID will stage data and update the Repair
    2. The Repair service running at CORRUPT will
      1. Pull data from VALID into a staging area
      2. Validate that the data transferred and matches the checksums in the ACE AM
      3. Overwrite the corrupt files
      4. Audit the files in the ACE AM
      5. Update the Repair with the result of the audit
  5. Once complete, the Repair Service at each node can be stopped

...