...
- Handle different change types separately
- Field updates can be handled specially since we know what they are (not just a triple subtraction/addition)
Separate Data Early
- Field updates can be handled specially since we know what they are (not just a triple subtraction/addition)
- The fact that our tools already have this concept of a 'Record' from the source is an advantage we can leverage to isolate new data, removed data, and updated data quickly.
- RecordHandler Toolwill be used to isolate New/Removed/Shared Records
- Subtract LastHarvestRH from CurrentHarvestRH, what remains are New Records (is in current harvest, but not last one)
- Subtract CurrentHarvestRH from LastHarvestRH, what remains are Removed Records (was in last harvest, but not current one)
- Subtract NewRecordRH and RemoveRecordRH from CurrentHarvestRH and we have that which overlaps (records that are in both harvests)
- RecordCompare Toolwill be used to compare records from SharedRH with the corresponding record in LastHarvestRH and will output records containing the changes in records that have been updated
Handle Change Types Separately
New Records
- Send through an optimized pipeline for records we know are new
Record Removals
- Score/Match records to VIVO and remove corresponding data from VIVO (Or modify it to reflect its historical nature, such as past jobs)
Record Updates
- Update Tool Specification