Overview
Excerpt |
---|
The Match tool will look at the numbers generated by Score and compare them to a threshold value. Input entities compared by Score that meet or exceed the threshold will have their identities changed to the URI of the person in VIVO, so that when the data is finally pulled into VIVO the new data will be linked to existing data. In this way you can fetch publications for your existing researchers. |
...
Match.java takes a model generated by Score and renames matches, creates links, or removes literals based on the associated scores.
...
Short Option | Long Option | Parameter Value Map | Description | Required |
---|---|---|---|---|
i | inputJena-config | CONFIG_FILE | inputJena JENA configuration filename | true |
I | inputOverride | override the JENA_PARAM of inputJena jena model config using VALUE | false | |
o | output-config | CONFIG_FILE | outputConfig JENA configuration filename | true |
V | vivoOverride | override the JENA_PARAM of vivoJena jena model config using VALUE | false | |
s | score-config | CONFIG_FILE | score data JENA configuration filename | true |
S | scoreOverride | override the JENA_PARAM of score jena model config using VALUE | false | |
t | threshold | THRESHOLD | match records with a score over THRESHOLD | true |
l | link | link the two matched entities together using INPUT_TO_VIVO_PREDICATE and INPUT_TO_VIVO_PREDICATE | false | |
r | rename |
| rename or remove the matched entity from scoring | false |
c | clear-type-and-literals |
| clear all rdf:type and literal values out of the nodes matched | false |
Usage
No Format |
---|
//from the env file
Match="java $OPTS -Dprocess-task=Match org.vivoweb.harvester.score.Match"
//from the script file
SCOREINPUT="-i $H2MODEL -ImodelName=$MODELNAME -IdbUrl=$MODELDBURL -IcheckEmpty=$CHECKEMPTY"
SCOREDATA="-s $H2MODEL -SmodelName=$SCOREDATANAME -SdbUrl=$SCOREDATADBURL -ScheckEmpty=$CHECKEMPTY"
MATCHOUTPUT="-o $H2MODEL -OmodelName=$MATCHEDNAME -OdbUrl=$MATCHEDDBURL -OcheckEmpty=$CHECKEMPTY"
MATCHTHRESHOLD = 1.0
$Match $SCOREINPUT $SCOREDATA $MATCHOUTPUT -t $MATCHTHRESHOLD -r -c
|
...
The match class runs a sparql query on the score data. This can help access the score data for other purposes.
No Format |
---|
?sInput = Input URI ?sVivo = Vivo URI PREFIX scoreValue: <http<http://vivoweb.org/harvester/scoreValue/>> SELECT DISTINCT ?sVivo ?sInput (sum(?weightValue) AS ?sum) WHERE { ?s scoreValue:InputRes ?sInput . ?s scoreValue:VivoRes ?sVivo . ?s scoreValue:hasScoreValue ?value . ?value scoreValue:WeightedScore ?weightValue . } GROUP BY ?sVivo ?sInput HAVING (?sum >= threshold ) ORDER BY ?sInput |