Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The VIVO Harvester is a collection of small Java tools which are meant to be strung together in various ways to create a harvest custom-tailored to your needs. This architecture makes the Harvester extremely versatile, but at the same time presents a steep learning curve.  Included in Harvester's scripts/ directory are several sample scripts which have been tested and will perform different types of harvests. One of the best ways to get started is to find one that is close to your needs, test it on a test server or virtual machine, and then tweak it until it meets your needs.  This page attempts to follow the steps of a "typical" harvest.

 

Fetch

...

The first step of a typical harvest is the get you data from your target source.  We call this the Fetch.  For example, let us suppose we have a VIVO installation containing researchers at our university, and we want to harvest from Pubmed information on publications written by researchers at our university. In this case we would use Harvester's PubmedFetch tool to send a query off to Pubmed, which will return the results of that query to us in its own XML format.  The harvesters Fetch package (org.vivoweb.harvester.fetch) contains various methods for retrieving data from external data sources.

...