Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

 

Excerpt

Depending on your data the next step may be to match incoming data with data already in VIVO. For example, if you have just pulled in some publication information from Pubmed, you might want to compare the author names with people in your VIVO, so that you can link the publications with the authors. This comparison is done via the Score tool, which compares any values you want between VIVO and the input data, and assigns a number to the comparison.

Score.java provides a method used to score incoming data. Data is assumed to be in a VIVO-like ontology and stored in a Jena model. Method can call any combination of scoring algorithms. Scoring function will attempt to match data to individuals in VIVO. Several algorithms can be utilized to determine when the match will be inserted into VIVO. Data produced by this method is stored in a separate scoring model which then is required by the Match to do the data changes.

...

  • The namespace in the input model of data to score. This allows different Score runs to be performed for different types of data, for example to score authors, publications, and journals separately.
  • The URI on which to compare an individual in the input model to an individual in VIVO. For example,

    No Format
    http://xmlns.com/foaf/0.1/firstName

    to compare authors by their first names.

  • The algorithm with which to run the comparison. An algorithm takes two strings and returns a floating-point number between 0.0 and 1.0. A 0.0 indicates complete rejection, while a 1.0 indicates a complete match. For example, the equality test algorithm takes the two strings and determines whether they are precisely the same string. If so, it returns 1.0; if not, it returns 0.0. Other algorthms, such as Levenshtein difference, perform a more thorough comparison of the strings and can return values in-between one and zero inclusively.
  • The weight of the particular comparison. This is typically a number between 0.0 and 1.0 and is multiplied by the output of the algorithm to get the score value for that pair of items and that URI. A lower weight means that this particular comparison is less important than others for this run.

...

Usage

Explanation

No Format

# Execute Score for Departments
$Score $SCOREMODELS -n ${BASEURI}org/ -AdeptId=$EQTEST -WdeptId=1.0 -FdeptId=$UFDEPTID -PdeptId=$UFDEPTID

Here $SCOREMODELS refers to the models being scored between.

No Format

SCOREINPUT="-i $H2MODEL -ImodelName=$MODELNAME -IdbUrl=$MODELDBURL -IcheckEmpty=$CHECKEMPTY"
SCOREDATA="-s $H2MODEL -SmodelName=$SCOREDATANAME -SdbUrl=$SCOREDATADBURL -ScheckEmpty=$CHECKEMPTY"
SCOREMODELS="$SCOREINPUT -v $VIVOCONFIG -VcheckEmpty=$CHECKEMPTY $SCOREDATA -t $TEMPCOPYDIR -b $SCOREBATCHSIZE"

...

The $UFDEPTID contains the predicate being scored on.

No Format

UFDEPTID="http://vivo.ufl.edu/ontology/vivo-ufl/deptID"

...

Your desired namespace will be something like

No Format

http://vivo.myDomain.edu/category

...