Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Handle different change types separately
    • Field updates can be handled specially since we know what they are (not just a triple subtraction/addition)

      Separate Data Early

  • The fact that our tools already have this concept of a 'Record' from the source is an advantage we can leverage to isolate new data, removed data, and updated data quickly.
  • RecordHandler Toolwill be used to isolate New/Removed/Shared Records
    • Subtract LastHarvestRH from CurrentHarvestRH, what remains are New Records (is in current harvest, but not last one)
    • Subtract CurrentHarvestRH from LastHarvestRH, what remains are Removed Records (was in last harvest, but not current one)
    • Subtract NewRecordRH and RemoveRecordRH from CurrentHarvestRH and we have that which overlaps (records that are in both harvests)
  • RecordCompare Toolwill be used to compare records from SharedRH with the corresponding record in LastHarvestRH and will output records containing the changes in records that have been updated

    Handle Change Types Separately

    New Records

  • Send through an optimized pipeline for records we know are new

    Record Removals

  • Score/Match records to VIVO and remove corresponding data from VIVO (Or modify it to reflect its historical nature, such as past jobs)

    Record Updates

  • Update Tool Specification