This is the detailed design document for the OAI (Open Archives Initiative) fetch portion of the VIVO harvester. This document is neither feature-complete nor comprehensive and will evolve as new requirements or optimizations are recognized.
The OAIFetch method used to ingest data from OAI Data sources. Brings in data as XML selected by date range and returns raw XML which is then stored in a file as determined by the configuration file.
Short Option |
Long Option |
Parameter Value Map |
Description |
Required |
---|---|---|---|---|
u |
url |
URL |
repository url without http:// |
true |
s |
start |
DATE |
beginning date of date range (YYYY-MM-DD) |
true |
e |
end |
DATE |
ending date of date range (YYYY-MM-DD) |
true |
o |
output |
CONFIG_FILE |
RecordHandler config file path |
true |
O |
outputOverride |
VALUE |
override the RH_PARAM of output record handler using VALUE |
false |
Parameters defined in delimited text file.
XML Stream of records written to file.
Log - Static, logfactory.
arrRequiredParamaters - An array of which parameters are required to run the OAI Fetch. They are: "address", "startDate", "endDate", and "filename".
strAddress - The website address of the OAI repository, without the protocol prefix. (No http://)
strStartDate - The start date for the range of records to pull, format is YYYY-MM-DD. If time is required, format is YYYY-MM-DDTHH:MM:SS:MSZ. Some repositories do not support millisend resolution. Example 2010-01-15T13:45:12:50Z
strEndDate - The end date for the range of records to pull, format is YYYY-MM-DD. If time is required, format is YYYY-MM-DDTHH:MM:SS:MSZ. Some repositories do not support millisend resolution. Example 2010-01-15T13:45:12:50Z
strFileName - The filename to write the XML data, should be XMLVault/OAI/outputfilenamegoeshere.xml
Executes the OAI Fetch using the parameters defined in the configuration file.
strAddress - The website address of the OAI repository, without the protocol prefix. (No http://)
strStartDate - The start date for the range of records to pull, format is YYYY-MM-DD. If time is required, format is YYYY-MM-DDTHH:MM:SS:MSZ. Some repositories do not support millisend resolution. Example 2010-01-15T13:45:12:50Z
strEndDate - The end date for the range of records to pull, format is YYYY-MM-DD. If time is required, format is YYYY-MM-DDTHH:MM:SS:MSZ. Some repositories do not support millisend resolution. Example 2010-01-15T13:45:12:50Z
strFileName - The filename to write the XML data, should be XMLVault/OAI/outputfilenamegoeshere.xml
Nothing, it writes to the output stream during execution and returns nothing.
Documentation needed.
Documentation needed.
address:www.twmuseums.org.uk/pnds/memorynet/ startDate:1000-01-01 endDate:2010-01-01 filename:XMLVault/OAI/MemoryNet.xml |
This is a listing of OAI data sources that conform to the OAI specification and will work with OAIFetch. In no way is this list comprehensive and if other repositories are found they should be added to this list.
http://cs1.ist.psu.edu/cgi-bin/oai.cgi
address:cs1.ist.psu.edu/cgi-bin/oai.cgi startDate:2005-01-01 endDate:2010-01-01 filename:XMLVault/OAI/CiteSeer.xml |