History

The Harvester began life as a specialized ETL tool meant to ease the process of data ingest into VIVO. It has transformed into a general semantic ETL tool.

Introduction

The VIVO Harvester is a library of tools designed to take read and transform data from external data sources and ingest it into VIVO or potentially any other triplestore or semantic platform. The library was originally developed at by the University of Florida during Harvester Team during the 2009-2011 NIH Grant. Development of the Harvester follows a monthly release cycle. New features are built in the first 2-3 weeks of the cycle, with testing and releasing occurring during the 3rd and 4th week of the cycle. Use the links below to learn more about individual tools the harvester is comprised of, or read .

The VIVO Harvester is currently maintained on GitHub by John Fereira from Cornell as part of VIVO-related projects including AgriVIVO and USDA VIVO. Other contributions to ongoing Harvester enhancements have been made by Alex Viggio through Symplectic, Ltd.

Source

Recommended Harvester branch to check out or download from Git

Architecture and flow

The VIVO Harvester is a collection of small Java tools which are meant to be strung together in various ways to create a harvest process that is custom-tailored to your needs and (importantly) repeatable. This architecture makes the Harvester extremely versatile, but at the same time presents a bit of a learning curve.

We highly recommend that you become familiar with the basics of semantic technologies including RDF and ontologies and download and install the VIVO software before embarking on a data ingest process. Try entering sample data ranging from people and their affiliations to publications, grants, or awards and honors; then export the RDF from VIVO to see what it looks like – for many people having an example is much more intuitive than interpreting ontology diagrams or writing RDF directly.

This following vignettes attempt to follow the steps of a "typical" harvest with a focus primarily on functionality, not configuration or execution.

Fetch

Excerpt Include

	Fetch
	Fetch
nopanel	true

Translate

Excerpt Include

	Translate
	Translate
nopanel	true

Score

Excerpt Include

	Score
	Score
nopanel	true

Match

Excerpt Include

	Match
	Match
nopanel	true

Change Namespace

Excerpt Include

	ChangeNamespace
	ChangeNamespace
nopanel	true

Update

This step allows for multiple Harvester runs in succession to recognize data that has been modified since the previous run and update accordingly. A "previous harvest model" is created, which on the first run contains all the data imported on that run. On subsequent runs, this is compared with the new data to determine triples that have been removed or added since the last run. This comparison is made by the Diff tool, and the output is an "Additions file" and a "Subtractions file", containing RDF/XML data that should be added and removed, respectively, from VIVO.

Transfer

The data from the Additions file is added both to VIVO and the previous harvest model in two separate calls of the Transfer tool. Then the data from the Subtractions file is removed both from VIVO and the previous harvest model in two more Transfer calls.

At this point a harvest is complete.

Next Steps

Included in Harvester's scripts/ directory are several sample scripts which have been tested and will perform different types of harvests. One of the best ways to get started is to find one that is close to your needs, test it on a test server or virtual machine, and then tweak it until it meets your needs.

Read the Harvester User Guide to learn more about using the harvester.

...

Typical harvest

Children Display

Harvester Instructions

Harvester User Guide

...

Pubmed Example Script

IP Example Script

Deployment

Video Walkthrus

Screencasts of example harvester runs: https://sourceforge.net/projects/vivo/files/VIVO%20Harvester/Demonstration/

...

Space shortcuts

Page tree

Versions Compared

Old Version 5

New Version Current

Key

History

Introduction

Source

Architecture and flow

Fetch

Translate

Score

Match

Change Namespace

Update

Transfer

Next Steps

Harvester Instructions

Deployment

Video Walkthrus

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 5

New Version Current

Key

History

Introduction

Source

Architecture and flow

Fetch

Translate

Score

Match

Change Namespace

Update

Transfer

Next Steps

Harvester Instructions

Deployment

Video Walkthrus