Overview
This tool is used to ingest data from the National Library of Medicine's catalog using their EUtils web service. PubMed allows queries to be designed to return very specific record sets based on a range of different attributes such as date added, date modified, number range, affiliation, etc. See their help for using the NLM Catalog search tool and also the advanced search tool, which helps you build your query.
The catalog records that match the query given to the tool will be fetched as XML data and loaded into the output RecordHandler.
Setup
Command Line Arguments
NLMJournalFetch extends NIHFetch, so it uses the same arguments.
Short Option |
Long Option |
Parameter Value Map |
Description |
Required |
---|---|---|---|---|
b |
batchSize |
NUMBER |
number of records to fetch per batch |
true |
m |
EMAIL_ADDRESS |
your contact email address |
true |
|
n |
numRecords |
NUMBER |
maximum records to return |
true |
o |
output |
CONFIG_FILE |
RecordHandler config file path |
true |
O |
outputOverride |
VALUE |
override the RH_PARAM of output recordhandler using VALUE |
false |
t |
termSearch |
SEARCH_STRING |
term to search against nlmcatalog |
true |
Configuration File
As with all the Harvester command-line tools, you can provide all the arguments as parameters is a configuration file (X/-config). Here is a sample configuration for NLMJournalFetch.
<?xml version="1.0" encoding="UTF-8"?> <Task> <Param name="email">sample.email@mydomain.tld</Param> <Param name="termSearch">sample AND edu[ad]</Param> <!-- these are set inside the example scripts, so are not needed <Param name="output">config/recordhandlers/tfrh.xml</Param> <Param name="outputOverride">fileDir=harvested-data/examples/nlmcatalog</Param> --> <Param name="numRecords">ALL</Param> <Param name="batchSize">1000</Param> </Task>
Execution
To execute the NLMJournalFetch tool from the commandline, there is a convenient environment config file that, when loaded in a bash shell, will allow you to execute NLMJournalFetch with a simple $NLMJournalFetch args
. For information about that, see Environment Config File.
Or you can simply call (paths relative to base harvester folder):
java -cp bin/harvester-<version>.jar:bin/dependency/* -Dprocess-task=PubmedFetch org.vivoweb.harvester.fetch.NLMJournalFetch
Design
See Design of NLMJournalFetch and its javadoc page