Depending on your data the next step may be to match incoming data with data already in VIVO. For example, if you have just pulled in some publication information from Pubmed, you might want to compare the author names with people in your VIVO, so that you can link the publications with the authors. This comparison is done via the Score tool, which compares any values you want between VIVO and the input data, and assigns a number to the comparison. |
Score.java provides a method used to score incoming data. Data is assumed to be in a VIVO-like ontology and stored in a Jena model. Method can call any combination of scoring algorithms. Scoring function will attempt to match data to individuals in VIVO. Several algorithms can be utilized to determine when the match will be inserted into VIVO. Data produced by this method is stored in a separate scoring model which then is required by the Match to do the data changes.
A Score run involves several concepts:
The URI on which to compare an individual in the input model to an individual in VIVO. For example,
http://xmlns.com/foaf/0.1/firstName |
to compare authors by their first names.
A Score run can contain multiple sets of URI, algorithm, and weight (linked together by a common, arbitrary parameter suffix). The total Score value of the individual is the sum of the products of the algorithm output and weight for each set. For example a Score run that is intended to exactly match the full name of a person might be passed in a URI of first name, algorithm of equality test, and weight of 0.3, plus a URI of last name, algorithm of equality test, and weight of 0.5. If both first and last name match, the total Score value will be (1.0 * 0.3) + (1.0 * 0.5) = 0.7. If only last name matched, it would be (0.0 * 0.3) + (1.0 * 0.5) = 5.0.
At this point Score is finished. All it does is generate these values. It is Match that determines what to do with them.
Short Option | Long Option | Parameter Value Map | Description | Required |
---|---|---|---|---|
i | inputJena-config | CONFIG_FILE | inputJena JENA configuration filename | true |
I | inputOverride | override the JENA_PARAM of inputJena jena model config using VALUE | false | |
v | vivoJena-config | CONFIG_FILE | vivoJena JENA configuration filename | true |
V | vivoOverride | override the JENA_PARAM of vivoJena jena model config using VALUE | false | |
s | score-config | CONFIG_FILE | score data JENA configuration filename | true |
S | scoreOverride | override the JENA_PARAM of score jena model config using VALUE | false | |
t | tempJenaDir | DIRECTORY_PATH | directory to store temp jena model | false |
A | algorithms | for RUN_NAME, use this CLASS_NAME (must implement Algorithm) to evaluate matches | true | |
W | weights | for RUN_NAME, assign this weight (0,1) to the scores | true | |
F | inputJena-predicates | for RUN_NAME,match | true | |
P | vivoJena-predicates | for RUN_NAME, assign this weight (0,1) to the scores | true | |
n | namespace | SCORE_NAMESPACE | limit match Algorithm to only match rdf nodes in inputJena whose URI begin with SCORE_NAMESPACE | false |
# Execute Score for Departments $Score $SCOREMODELS -n ${BASEURI}org/ -AdeptId=$EQTEST -WdeptId=1.0 -FdeptId=$UFDEPTID -PdeptId=$UFDEPTID |
Here $SCOREMODELS refers to the models being scored between.
SCOREINPUT="-i $H2MODEL -ImodelName=$MODELNAME -IdbUrl=$MODELDBURL -IcheckEmpty=$CHECKEMPTY" SCOREDATA="-s $H2MODEL -SmodelName=$SCOREDATANAME -SdbUrl=$SCOREDATADBURL -ScheckEmpty=$CHECKEMPTY" SCOREMODELS="$SCOREINPUT -v $VIVOCONFIG -VcheckEmpty=$CHECKEMPTY $SCOREDATA -t $TEMPCOPYDIR -b $SCOREBATCHSIZE" |
$VIVOCONFIG refers to the Configuration within vivo.xml
$SCOREINPUT is the current harvested data model
$SCOREDATA is a model containing the data generated from the scoring process and is used by Match to make the changes needed.
The $UFDEPTID contains the predicate being scored on.
UFDEPTID="http://vivo.ufl.edu/ontology/vivo-ufl/deptID" |
The -n ${BASEURI}org/ filter the changes to the specific namespace,
Your desired namespace will be something like
http://vivo.myDomain.edu/category |
As long as all your department URIs begin with that string. This is important if your predicate seems to be part of many different resources.
The EQTEST is making sure that the match is 100% equal (A stands for algorithm)
The F and P flags are determining the predicates that matched on within the input and VIVO models respectively.
The deptId= part needs to be consistent since one score statement can score in multiple ways. you may want to choose label=
Initializes the variables
Verify that each map contains the same keys
Load a batch of data into the score model
Builds the select query.