...
Note that we have two nodes: CORRUPT and VALID
- CORRUPT node notices data in their Audit Manager (ACE-AM) is showing File Corrupt, indicating that checksums on disk have changed
- Discussion happens internally about who has this and can repair it
- SSH keys exchanged so that data transfer can occur for files which are to be repaired
- CORRUPT logs on to the Ingest server and selects 'Request Repair' in order to create a 'Repair Request'
- Inputs ACE AM credentials to query for the corrupt collection
- Select the Collection
- Select the Files to repair and the Node where they will be Repaired
- VALID logs onto the Ingest server and selects 'Fulfill Repair' in order to stage data for the repair
- At this point, both CORRUPT and VALID nodes should start the Repair service
- The Repair service running at VALID will stage data and update the Repair
- The Repair service running at CORRUPT will
- Pull data from VALID into a staging area
- Validate that the data transferred and matches the checksums in the ACE AM
- Overwrite the corrupt files
- Audit the files in the ACE AM
- Update the Repair with the result of the audit
- Once complete, the Repair Service at each node can be stopped
...