Skip to end of metadata
Go to start of metadata

The peoplesoft script can be found in the harvester scripts file, run-peoplesoft.sh. The code uses

Header

This is just a header comment at the top of the code that describes the bsd license and who contributed to the creation of the file

#!/bin/bash

#Copyright (c) 2010-2011 VIVO Harvester Team. For full list of contributors, please see the AUTHORS file provided.
#All rights reserved.
#This program and the accompanying materials are made available under the terms of the new BSD license which accompanies this distribution, and is available at http://www.opensource.org/licenses/bsd-license.html
# 
# Contributors:
#     Christopher Haines, Dale Scheppler, Nicholas Skaggs, Stephen V. Williams - initial API and implementation

Setup

# set to the directory where the harvester was installed or unpacked
# HARVESTER_INSTALL_DIR is set to the location of the installed harvester
#	If the deb file was used to install the harvester then the
#	directory should be set to /usr/share/vivo/harvester which is the
#	current location associated with the deb installation.
#	Since it is also possible the harvester was installed by
#	uncompressing the tar.gz the setting is available to be changed
#	and should agree with the installation location
HARVESTER_INSTALL_DIR=/usr/share/vivo/harvester
export HARVEST_NAME=peoplesoft
export DATE=`date +%Y-%m-%d'T'%T`

# Add harvester binaries to path for execution
# The tools within this script refer to binaries supplied within the harvester
#	Since they can be located in another directory their path should be
#	included within the classpath and the path environmental variables.
export PATH=$PATH:$HARVESTER_INSTALL_DIR/bin
export CLASSPATH=$CLASSPATH:$HARVESTER_INSTALL_DIR/bin/harvester.jar:$HARVESTER_INSTALL_DIR/bin/dependency/*
export CLASSPATH=$CLASSPATH:$HARVESTER_INSTALL_DIR/build/harvester.jar:$HARVESTER_INSTALL_DIR/build/dependency/*

# Exit on first error
# The -e flag prevents the script from continuing even though a tool fails.
#	Continuing after a tool failure is undesirable since the harvested
#	data could be rendered corrupted and incompatible.
set -e

# Supply the location of the detailed log file which is generated during the script.
#	If there is an issue with a harvest, this file proves invaluable in finding
#	a solution to the problem. I has become common practice in addressing a problem
#	to request this file. The passwords and usernames are filter out of this file
#	To prevent these logs from containing sensitive information.
echo "Full Logging in $HARVEST_NAME.$DATE.log"
if [ ! -d logs ]; then
  mkdir logs
fi
cd logs
touch $HARVEST_NAME.$DATE.log
ln -sf $HARVEST_NAME.$DATE.log $HARVEST_NAME.latest.log
cd ..

#clear old data
# For a fresh harvest, the removal of the previous information maintains data integrity.
#	If you are continuing a partial run or wish to use the old and already retrieved
#	data, you will want to comment out this line since it could prevent you from having
# 	the required harvest data.
rm -rf data

# clone db
# DatabaseClone is a tool used to make a local copy of the database. One reason for this
#	is that constantly querying a database could put undue load on a repository. This
#	allows the use of intensive queries to happen to a local copy and only tie up the
#	resources in the local machine.
harvester-databaseclone -X databaseclone.config.xml

Fetch

The information is pulled into the system. Since it is a standard database JDBCFetch is used.

# Execute Fetch
# This stage of the script is where the information is gathered together into one local
#	place to facilitate the further steps of the harvest. The data is stored locally
#	in a format based off of the source. The format is a form of RDF yet its ontology
#	too simple to be put into a model and be useful.
# The JDBCFetch tool in particular takes the data from the chosen source described in its
#	configuration XML file and places it into record set in the flat RDF directly 
#	related to the rows, columns and tables described in the target database.
harvester-jdbcfetch -X jdbcfetch.config.xml

Translate

Now that we have our data, we need to translate it into the vivo ontology in rdf/xml format

# Execute Translate
# This is the part of the script where the outside data, in its flat RDF form is used to
#	create the more linked and descriptive form related to the ontological constructs.
#	The traditional XSL language is used to achiveve this part of the workflow.
harvester-xsltranslator -X xsltranslator.config.xml

Transfer

We now have to push all the translated records into a single jena model

# Execute Transfer to import from record handler into local temp model
# From this stage on the script places the data into a Jena model. A model is a
#	data storage structure similar to a database, but is in RDF.
# The harvester tool Transfer is used to move/add/remove/dump data in models.
# For this call on the transfer tool:
# -s refers to the source translated records file, which was just produced by the translator step
# -o refers to the destination model for harvested data
# -d means that this call will also produce a text dump file in the specificed location 
harvester-transfer -s translated-records.config.xml -o harvested-data.model.xml -d data/harvested-data/imported-records.rdf.xml

Scoring and Matching

The various name spaces determined during the translation are scored against specific data in the vivo model.

The scoring process results in a model which contains information about the score results to be used in the matches.

The matching process changes the URI of the matched data to the URI's present in VIVO.

# Execute Score
# In the scoring phase the data in the harvest is compared to the data within Vivo and a new model
# 	is created with the values / scores of the data comparsions. 

# Execute Score for People
harvester-score -X score-people.config.xml

# Execute Score for Departments
harvester-score -X score-departments.config.xml

# Find matches using scores and rename nodes to matching uri
# Using the data model created by the score phase, the match process changes the harvested uris for
# 	comparsion values above the chosen threshold within the xml configuration file.
# Execute Match for People and Departments
harvester-match -X match-people-departments.config.xml

#Truncate Score Data model
# Since we are finished with the scoring data for people and departments,
#   we need to clear out all that old data before we add more
harvester-jenaconnect -j score-data.model.xml -t

# Execute Score for Positions
harvester-score -X score-positions.config.xml

# Execute Match for Positions
harvester-match -X match-positions.config.xml

Changing Namespaces

For those parts which didn't find a match they are given URIs within the vivo's namespace.

# Execute ChangeNamespace to get unmatched  into current namespace
# This is where the new people, departments, and positions from the harvest are given uris within the namespace of Vivo
# 	If there is an issue with uris being in another namespace, this is the phase
#	which should give some light to the problem.
# Execute ChangeNamespace for People
harvester-changenamespace -X changenamespace-people.config.xml

# Execute ChangeNamespace for Departments
harvester-changenamespace -X changenamespace-departments.config.xml

# Execute ChangeNamespace for Positions
harvester-changenamespace -X changenamespace-positions.config.xml

Updating

The Subtraction and Additions are found while comparing to the previous harvest model.

*Note: The previous model should be equivalent to the actual data in VIVO. *

If the previously harvested data is edited, then that edit should also be applied to the previous model

# Find Subtractions
# When making the previous harvest model agree with the current harvest, the entries that exist in
#	the previous harvest but not in the current harvest need to be identified for removal.
harvester-diff -X diff-subtractions.config.xml

# Find Additions
# When making the previous harvest model agree with the current harvest, the entries that exist in
#	the current harvest but not in the previous harvest need to be identified for addition.
harvester-diff -X diff-additions.config.xml

Applying updates

The updates are applied to the previous model and then to VIVO. This should cause the previous model to be equal to the actual data harvested. The VIVO should also now have the data which is reliant on the harvest changed to be equal to the new harvest's data.

# Remove Subtractions from Previous Harvest model
harvester-transfer -o previous-harvest.model.xml -r data/vivo-subtractions.rdf.xml -m
# Add Additions to Previous Harvest model
harvester-transfer -o previous-harvest.model.xml -r data/vivo-additions.rdf.xml
# Remove Subtractions from VIVO for pre-1.2 versions
harvester-transfer -o vivo.model.xml -r data/vivo-subtractions.rdf.xml -m
# Add Additions to VIVO for pre-1.2 versions
harvester-transfer -o vivo.model.xml -r data/vivo-additions.rdf.xml

See also

Peoplesoft Example Script 1.2