Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Presenter3 - Michael Barbieri

Harvester Script

Intro and Setup

Intro video webcast

Presenter1: Welcome to the webinar concerning setup and use of the VIVO Harvester tools. Though the project has specific uses of these tools it is important to remember that they are designed to be able to be used separately as well as in scripts and as java libraries. For this presentation, We are assuming that you have a fresh working install of VIVO-1.2. If you already have a copy of the harvester then you can see this next part as a review. If you have the current virtual appliance, you already have a copy and it is under /usr/share/vivo/harvester..

Goto Sourceforge

Presenter goes to Sourceforge page. < http://vivo.sourceforge.net/ >

...

does a listing of the files

configure harvester

Presenter1: Within the “Harvester directory” you can see several folders. The ones of interest to us are the config and scripts.In order for the harvester to function it will have to be configured to work with the installed vivo on your system.In order to continue with the example database fetch you will have to configure some files. The first is the “vivo.xml” file found in the “Harvester/config/models/vivo.xml” it needs to comply with your deploy.properties which you used to set up your current version of VIVO.

...

Edits files to make them comply.

Tools

Presenter1: In looking through a script you will see the array of tools that are the harvester. I will now describe them in summary one at a time. In the order they are used.

...

Presenter1: We do the updating based off graph math. The process is as follows:

Graph Math Updating

(On the Diff page)

No Format
[V]

= vivo model

No Format
[H]

= new harvested model

...

--Presenter2 takeover --docs.

JDBCFetch

JDBC video webcast

Presenter2: One of our tools, JDBCFetch, is designed to fetch data out of a standard relational database storing that data in a simple rdf/xml format that can later be translated to the VIVO ontology.

Presenter2: To demonstrate this tool, we have created an example database, containing HR data for a fictional university, and the example translation file and complete script.

Setup demo Database

Presenter2: We do have to do a bit of setup before we can run this script however. First, we will need to setup a mysql database where we can store the sample HR data.

...

Presenter2: Now that we have a database setup, we need to populate it with our sample HR data.

Populate database

Presenter2: There is a file DemoDB.mysqlDump.sql at “https://sourceforge.net/projects/vivo/files/VIVO%20Harvester/Example%20Files/” which we will load into our database to show a simple JDBC harvest run.

...

– Presenter2 switches back to the console –

Demo Harvest

Presenter2: By now we should have finished pushing this data into our database, so we can begin the actual fun of fetching it. All of this has been setup, remember, but don't worry, the rest is mostly done for you.

...

Presenter2: In production situations, we have found it better to setup a clone of the data we want to query so that we can run queries against it without putting any load on production systems.

Clone data

Presenter2: We see the tool $DatabaseClone is called, loading its parameters from the configuration file named example.databaseclone.xml, but we are explicitly setting the parameter named --outputConnection to be that of the $CLONEDBURL

...

– Presenter2 closes the script file –

Run Harvest

Presenter2: To run the script, while in the base Harvester/ folder, we simply type:

...

Presenter should attempt to answer questions, referring to the wiki pages as specifically as possible.

-Presenter 3 takeover-

Publication Harvest

PubMed video webcast

Presenter3: I'm going to show you a couple examples of the Harvester's ability to gather publication data from various sources. First I will demonstrate a harvest from an online source, in this case Pubmed. Next I will show you an example of a harvest from an existing file containing exported RefWorks data.

Presenter3: For Pubmed, what I want to demonstrate is the scoring and matching in action. What I'm going to do is create an account for a known genetics researcher at the University of Florida, Barry Byrne, and then I am going to fetch publications from the Pubmed database for which Mr. Byrne is an author. If all goes according to plan, what we will see in VIVO at the end of the harvest is that these publications have been automatically linked to Mr. Byrne's profile.

Create example entry

Browse to VIVO, Admin. Create Barry Byrne, give bbyrne@ufl.edu email address.

Presenter3: Included with Harvester are several sample scripts to help get you started. The one we will base this harvest off of is run-pubmed.sh.

Edit scripts

nano scripts/run-pubmed.sh.

...

Presenter3: The first thing we will need to mess with is the fetch configuration. In this case, it's pointing to a configuration file example.pubmedfetch.xml. Let's take a look at that.

configure task

nano config/tasks/example.pubmedfetch.xml

...

Presenter3: There's one final change we're going to make and then we are ready to run our fetch. Presently, in VIVO 1.2, for all the publications pulled in by the fetch, any co-authors of Barry Byrne would be listed in that publication as a missing author. If we would like instead to pull in these authors and make stubs for them with just their name, we can uncomment this ChangeNamespace line and comment the Qualify line.

switch to create stubs

uncomment ChangeNamespace and comment Qualify

...

MODS (todo: bibutils download):

Mods

mods video webcast

Presenter3: Now we're going to look at a harvest from an existing document. Most of it is similar, so I won't repeat too much, but there are some important things to mention. We are starting with a file from RefWorks in the BibTeX format. Now, it is certainly possible to write an XSLT translation from BibTeX to VIVO, but bundled with Harvester is a third-party tool called Bibutils that converts several different formats, including BibTeX, into an intermediate format called MODS. And also with Harvester is a sample XSLT translation file to convert a MODS file to VIVO. So let's take a look at run-mods.sh.

...