Method used to ingest data from PubMed SOAP interface. Brings in data as XML selected by either queries or record ranges and returns a stream of raw RDF/XML. Method can call a variety of fetch methods that allow selecting records based on a range of different attributes such as date added, date modified, number range, affiliation, etc.

Usage

To successfully harvest from PubMed:

Model - vivo.xml should be configured to point to your chosen vivo.
Task - create a pubmedfetch.xml (2 examples are provided).(Help for the search term)
Datamap - pubmed-to-vivo.xsl currently maps the data to the UF implementation this will have to be adjusted.

Methods

serializeFetchRequest

Runs, sanitizes, and outputs the results of a EFetch request to the xmlWriter

create a buffer
connect to pubmed
run the efetch request
get the article set
create XML writer
output to buffer
dump buffer to string
use sanitizeXML on string

sanitizeXML

Sanitizes the XML in preparation for the output stream

replaces the input characters
writes to the output stream
1. the OsWriter is provided in the superclass NIHFetch

Configuration file example

<?xml version="1.0" encoding="UTF-8"?>
<Task type="org.vivoweb.harvester.fetch.PubmedSOAPFetch">
	<Param name="email">swilliams@ichp.ufl.edu</Param>
	<Param name="output">config/recordHandlers/Pubmed-XML-h2RH.xml</Param>
	<Param name="termSearch">ufl AND edu[ad]</Param>
	<Param name="numRecords">100</Param>
	<Param name="batchSize">1000</Param>
</Task>

Flowchart

Space shortcuts

Page tree

Usage

Methods

serializeFetchRequest

sanitizeXML

Configuration file example

Flowchart

Space shortcuts

Page tree

Design of PubmedFetch

Usage

Methods

serializeFetchRequest

sanitizeXML

Configuration file example

Flowchart