Diff takes two data models (called a "minuend" model and a "subtrahend" model), compares them, and returns all triples in the minuend model that are not in the subtrahend model.
For example, a Diff between model M1 and M2, where M1 contains records (A, B, C, D, E) and M2 contains (C, D, E, F, G) would return the set (A, B). Reversing the arguments would return (F, G).
This is commonly used in Harvester scripts immediately prior to entering newly harvested data into VIVO. When the results of the previous run of the harvest are used as the minuend, and the new input is the subtrahend, then the output of Diff is items which are already in VIVO due to the previous harvest but which are not present in the new harvest. These are assumed to have been removed and/or replaced, and so this result (called the "subtraction file") is used to remove data from VIVO. When the arguments are reversed, the result is items in the new harvest which were never previously harvested. This result (called the "addition file") is used to add data to VIVO.
The subtraction file and the addition file are then applied to VIVO (the -m switch for Transfer is used for the subtractions), and also to the "previous harvest model" so that it stays up-to-date allowing Diff to work properly for future runs.
Data in the intersection between the two sets are not output in either execution of Diff, since because they are identical both in the input and in VIVO, it would be unnecessary to handle them. In this way Diff is used to prevent unnecessary overhead on the production VIVO database.
When diffing to create subtraction models, some processes may wish to preserve entities in the minuend (previous harvest) which are entirely not present in the subtrahend (new input). The selective diff mode generates a diff model as per the normal process, but prunes the resulting model such that core elements completely absent from the subtrahend are not included. This allows for changes to properties to pass through, but prevents the removal of large swaths of data in the event that the minuend contains large numbers of entries, but the subtrahend only includes updates to several.
Short Option |
Long Option |
Parameter Value Map |
Description |
Required |
|
---|---|---|---|---|---|
m |
minuend |
CONFIG_FILE |
config file for input jena model |
true |
|
M |
minuendOverride |
override the JENA_PARAM of input jena model config using VALUE |
false |
||
s |
subtrahend |
CONFIG_FILE |
config file for input jena model |
true |
|
S |
subtrahendOverride |
override the JENA_PARAM of input jena model config using VALUE |
false |
||
o |
output |
CONFIG_FILE |
config file for output jena model |
true |
|
O |
outputOverride |
override the JENA_PARAM of output jena model config using VALUE |
false |
||
d |
dumptoFile |
FILENAME |
filename for output |
true |
|
e |
selective-diff |
|
use selective diff |
false |
|
U |
update-types |
type to be updated with selective diff |
false |
Name |
Type |
Visibility |
Description |
---|---|---|---|
log |
Logger |
private |
SLF4J Logger statically set at the top of the file |
minuendJC |
JenaConnect |
private |
|
subtrahendJC |
JenaConnect |
private |
|
output |
JenaConnect |
private |
|
dumpFile |
String |
private |
String of the file name as set by the uesr |
updateTypes |
List<String> |
private |
List of types to be updated in selectiveDiff |
bUsingSelectiveDiff |
boolean |
private |
Whether or not we are using selectiveDiff. Implicitly true if updateTypes specified. |
bHasUpdateTypes |
boolean |
private |
Whether or not types are specified for use by selectiveDiff |
-m config/jenaModels/h2.xml -M dbUrl="jdbc:h2:XMLVault/h2Pubmed/score/store" -M modelName=pubmedScore -s config/jenaModels/VIVO.xml -S modelName="http://vivoweb.org/ingest/pubmed" -d XMLVault/update_Additions.rdf.xml
How Diff is used to do "graph math" updating
V = vivo model |
H = new harvested model
P = previous harvest model | A = additions
S = subtractions | |
A = Diff (H → P) |
The additions are the triples that are in the new harvest when the old harvest triples are removed. |
S = Diff (P → H) |
The subtractions are the triples that are in the old harvest when the new harvest is removed. |
P = V – S |
P = V + A | The application of these to the old harvest will make the old harvest equal to the new harvest. |
V = V – S |
V = V + A | The application of these to the vivo model will make the information in the vivo model agree with the harvest without tampering with the data. |