Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

 

Excerpt

The Match tool will look at the numbers generated by Score and compare them to a threshold value. Input entities compared by Score that meet or exceed the threshold will have their identities changed to the URI of the person in VIVO, so that when the data is finally pulled into VIVO the new data will be linked to existing data. In this way you can fetch publications for your existing researchers.

...

Match.java takes a model generated by Score and renames matches, creates links, or removes literals based on the associated scores.

...

Short Option

Long Option

Parameter Value Map

Description

Required

i

inputJena-config

CONFIG_FILE

inputJena JENA configuration filename

true

I

inputOverride

override the JENA_PARAM of inputJena jena model config using VALUE

false

o

output-config

CONFIG_FILE

outputConfig JENA configuration filename

true

V

vivoOverride

override the JENA_PARAM of vivoJena jena model config using VALUE

false

s

score-config

CONFIG_FILE

score data JENA configuration filename

true

S

scoreOverride

override the JENA_PARAM of score jena model config using VALUE

false

t

threshold

THRESHOLD

match records with a score over THRESHOLD

true

l

link

link the two matched entities together using INPUT_TO_VIVO_PREDICATE and INPUT_TO_VIVO_PREDICATE

false

r

rename

 

rename or remove the matched entity from scoring

false

c

clear-type-and-literals

 

clear all rdf:type and literal values out of the nodes matched

false

Usage

No Format

//from the env file
Match="java $OPTS -Dprocess-task=Match org.vivoweb.harvester.score.Match"

//from the script file
SCOREINPUT="-i $H2MODEL -ImodelName=$MODELNAME -IdbUrl=$MODELDBURL -IcheckEmpty=$CHECKEMPTY"
SCOREDATA="-s $H2MODEL -SmodelName=$SCOREDATANAME -SdbUrl=$SCOREDATADBURL -ScheckEmpty=$CHECKEMPTY"
MATCHOUTPUT="-o $H2MODEL -OmodelName=$MATCHEDNAME -OdbUrl=$MATCHEDDBURL -OcheckEmpty=$CHECKEMPTY"
MATCHTHRESHOLD = 1.0

$Match $SCOREINPUT $SCOREDATA $MATCHOUTPUT -t $MATCHTHRESHOLD -r -c

...

The match class runs a sparql query on the score data. This can help access the score data for other purposes.

No Format

?sInput = Input URI
?sVivo  = Vivo URI

PREFIX scoreValue: &lt;http<http://vivoweb.org/harvester/scoreValue/&gt;>
SELECT DISTINCT ?sVivo ?sInput (sum(?weightValue) AS ?sum)
WHERE {
  ?s scoreValue:InputRes ?sInput . 
  ?s scoreValue:VivoRes ?sVivo .
  ?s scoreValue:hasScoreValue ?value .
  ?value scoreValue:WeightedScore ?weightValue .
}
GROUP BY ?sVivo ?sInput 
HAVING (?sum >= threshold ) 
ORDER BY ?sInput