The idea is to essentially generate signatures (md5 hashes) that represent specific comparison "algorithms" (e.g. a hash to represent title+author)
If the signatures between two items are identical, then they are flagged as possible duplicates.
Signatures would be stored in Solr, and would be (re)generated each time an Item is indexed
We could configure multiple deduplication algorithms in DSpace
e.g. If there are 2 algorithms in use, then each Item would have two signatures (one per algorithm). This is what is shown as an example in "Title" and "Identifier" tabbed mockup, as those are two algorithms
Items would then be a possible duplicate if they have one matching signature.
Submitter will have the opportunity to determine whether a possible duplicate is in fact a duplicate
They may still wish to submit a duplicate if the original's metadata is incorrect. Submitter can add a comment to ask the two entries be merged.
Andrea agrees with feedback from last week that the "Possible duplicate" Popup is not ideal in the Submission UI
The Tabbed duplicate merger screens are Administrative screens. May need more feedback.
Is this work "in scope" or are we starting to expand the scope too much with this work?
Andrea feels this feature is a good example of extendability of the Submission UI, i.e. creating a custom "step" (as well as of Admin UI functions)
We should consider treating this as an "add-on" for now. Still give feedback on the work, and help with the designs, etc. However, based on how other work progresses, we may need to determine whether this will be "in scope" for DSpace 7, or whether this is initially shipped as an "addon" and brought into DSpace later
Next Meeting is Thurs, Oct 5 at 15UTC via Google Hangouts
Give updates on latest status. (Note: Andrea will miss this meeting as he'll be at a conference)