Page History
...
Info | ||
---|---|---|
| ||
|
OAI-PMH Server Maintenance
After activating the OAI-PMH server, you need to also ensure its index is updated on a regular basis. Currently, this doesn't happen automatically within DSpace. Instead, you must schedule the [dspace.dir]/bin/dspace oai import
commandline tool to run on a regular basis (usually at least nightly, but you could schedule it more frequently).
Here's an example cron that can be used to schedule an OAI-PMH reindex on a nightly basis (for a full list of recommended DSpace cron tasks see Scheduled Tasks via Cron):
Code Block |
---|
# Update the OAI-PMH index with the newest content (and re-optimize that index) at midnight every day
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING OAI-PMH
# (This ensures new content is available via OAI-PMH and ensures the OAI-PMH index is optimized for better performance)
0 0 * * * [dspace.dir]/bin/dspace oai import -o > /dev/null |
More information about the dspace oai
commandline tool can be found in the OAI Manager documentation.
OAI-PMH / OAI-ORE Harvester (Client)
...
There are many possible configuration options for the OAI harvester. Most of them are technical and therefore omitted from the dspace.cfg file itself, using hard-coded defaults instead. However, should you wish to modify those values, including them in oai.cfg will override the system defaultsthese are contained in the [dspace]/config/modules/oai.cfg
file (unless otherwise noted below). They may be updated there or overridden in your local.cfg
config file (see Configuration Reference).
Configuration File: |
| ||
---|---|---|---|
Property: |
| ||
Example Value: |
| ||
Informational Note: | The EPerson under whose authorization automatic harvesting will be performed. This field does not have a default value and must be specified in order to use the harvest scheduling system. This will most likely be the DSpace admin account created during installation. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | The base url of the OAI-PMH disseminator webapp (i.e. do not include the /request on the end). This is necessary in order to mint URIs for ORE Resource Maps. The default value of | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | The webapp responsible for minting the URIs for ORE Resource Maps. If using oai, the
| ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | Determines whether the harvest scheduler process starts up automatically when the XMLUI webapp is redeployed. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | This field can be repeated and serves as a link between the metadata formats supported by the local repository and those supported by the remote OAI-PMH provider. It follows the form | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | This field works in much the same way as | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | Amount of time subtracted from the from argument of the PMH request to account for the time taken to negotiate a connection. Measured in seconds. Default value is 120. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | How frequently the harvest scheduler checks the remote provider for updates. Should always be longer than timePadding . Measured in minutes. Default value is 720. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | The heartbeat is the frequency at which the harvest scheduler queries the local database to determine if any collections are due for a harvest cycle (based on the harvestFrequency) value. The scheduler is optimized to then sleep until the next collection is actually ready to be harvested. The minHeartbeat and maxHeartbeat are the lower and upper bounds on this timeframe. Measured in seconds. Default value is 30. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | The heartbeat is the frequency at which the harvest scheduler queries the local database to determine if any collections are due for a harvest cycle (based on the harvestFrequency) value. The scheduler is optimized to then sleep until the next collection is actually ready to be harvested. The minHeartbeat and maxHeartbeat are the lower and upper bounds on this timeframe. Measured in seconds. Default value is 3600 (1 hour). | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | How many harvest process threads the scheduler can spool up at once. Default value is 3. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | How much time passes before a harvest thread is terminated. The termination process waits for the current item to complete ingest and saves progress made up to that point. Measured in hours. Default value is 24. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | You have three (3) choices. When a harvest process completes for a single item and it has been passed through ingestion crosswalks for ORE and its chosen descriptive metadata format, it might end up with DIM values that have not been defined in the local repository. This setting determines what should be done in the case where those DIM values belong to an already declared schema. Fail will terminate the harvesting task and generate an error. Ignore will quietly omit the unknown fields. Add will add the missing field to the local repository's metadata registry. Default value: fail. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | When a harvest process completes for a single item and it has been passed through ingestion crosswalks for ORE and its chosen descriptive metadata format, it might end up with DIM values that have not been defined in the local repository. This setting determines what should be done in the case where those DIM values belong to an unknown schema. Fail will terminate the harvesting task and generate an error. Ignore will quietly omit the unknown fields. Add will add the missing schema to the local repository's metadata registry, using the schema name as the prefix and "unknown" as the namespace. Default value: fail. | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | A harvest process will attempt to scan the metadata of the incoming items (identifier.uri field, to be exact) to see if it looks like a handle. If so, it matches the pattern against the values of this parameter. If there is a match the new item is assigned the handle from the metadata value instead of minting a new one. Default value: hdl.handle.net . | ||
Property: |
| ||
Example Value: |
| ||
Informational Note: | Pattern to reject as an invalid handle prefix (known test string, for example) when attempting to find the handle of harvested items. If there is a match with this config parameter, a new handle will be minted instead. Default value: 123456789 . |