Old Release
This documentation relates to an old version of DSpace, version 3.x. Looking for another version? See all documentation.
This DSpace release is end-of-life and is no longer supported.
Introduction
Open Archives Initiative Protocol for Metadata Harvesting is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.
What is OAI 2.0?
OAI 2.0 is a Java implementation of an OAI-PMH data provider interface developed by Lyncode that uses XOAI, an OAI-PMH Java Library.
Why OAI 2.0?
Projects like OpenAIRE, Driver have specific metadata requirements (to the published content through the OAI-PMH interface). As the OAI-PMH protocol doesn't establish any frame to these specifics, OAI 2.0 can, in a simple way, have more than one instance of an OAI interface (feature provided by the XOAI core library) so one could define an interface for each project. That is the main purpose, although, OAI 2.0 allows much more than that.
Concepts (XOAI Core Library)
To understand how XOAI works, one must understand the concept of Filter, Transformer and Context. With a Filter it is possible to select information from the data source. A Transformer allows one to make some changes in the metadata before showing it in the OAI interface. XOAI also adds a new concept to the OAI-PMH basic specification, the concept of context. A context is identified in the URL:
http://www.example.com/oai/<context>
Contexts could be seen as virtual distinct OAI interfaces, so with this one could have things like:
- http://www.example.com/oai/request
- http://www.example.com/oai/driver
- http://www.example.com/oai/openaire
With this ingredients it is possible to build a robust solution that fulfills all requirements of Driver, OpenAIRE and also other project-specific requirements. As shown in Figure 1, with contexts one could select a subset of all available items in the data source. So when entering the OpenAIRE context, all OAI-PMH request will be restricted to that subset of items.
At this stage, contexts could be seen as sets (also defined in the basic OAI-PMH protocol). The magic of XOAI happens when one need specific metadata format to be shown in each context. Metadata requirements by Driver slightly differs from the OpenAIRE ones. So for each context one must define its specific transformer. So, contexts could be seen as an extension to the concept of sets.
To implement an OAI interface from the XOAI core library, one just need to implement the datasource interface.
OAI 2.0
OAI 2.0 is a separate webapp which is a complete substitute for the old "oai" webapp. OAI 2.0 has a configurable data source, by default it will not query the DSpace SQL database at the time of the OAI-PMH request. Instead, it keeps the required metadata in its Solr index (currently in a separate "oai" Solr core) and serves it from there. It's also possible to set OAI 2.0 to only use the database for querying purposes if necessary, but this decreases performance significantly. Furthermore, it caches the requests, so doing the same query repeatedly is very fast. In addition to that it also compiles DSpace items to make uncached responses much faster.
Details about OAI 2.0 internals can be found here.
Using Solr
OAI 2.0 uses the Solr data source by default.
The Solr index can be updated at your convenience, depending on how fresh you need the information to be. Typically, the administrator sets up a nightly cron job to update the Solr index from the SQL database.
OAI Manager (Solr Data Source)
OAI manager is a utility that allows one to do certain administrative operations with OAI.
Syntax
[dspace]/bin/dspace oai <action> [parameters]
Actions
- import Imports DSpace items into OAI Solr index (also cleans OAI cache)
- clean-cache Cleans the OAI cache
Parameters
- -o Optimize index after indexing
- -c Clears the Solr index before indexing (it will import all items again)
- -v Verbose output
- -h Shows an help text
Scheduled Tasks
In order to refresh the OAI Solr index, it is required to run the [dspace]/bin/dspace oai import
command periodically. You can add the following task to your crontab:
0 3 * * * [dspace]/bin/dspace oai import
Note that [dspace]
should be replaced by the correct value, that is, the value defined in dspace.cfg
parameter dspace.dir
.
Using Database
OAI 2.0 could also work using the database for querying. To configure that one must change the [dspace]/config/modules/xoai.cfg file, specifically parameter 'storage', setting it to database. This decreases performance significantly and likely has no other benefits than leaving out Solr as a dependency.
OAI Manager (Database Data Source)
OAI manager is a utility that allows one to do some administrative operations with OAI.
Syntax
[dspace]/bin/dspace oai <action> [parameters]
Actions
- clean-cache Cleans the OAI cache
- compile-items Compiles DSpace items
- erase-compiled-items Erases all DSpace compiled items
Parameters
- -v Verbose output
- -h Shows an help text
Scheduled Tasks
In order to refresh the OAI cache and compile DSpace items (for fast responses), it is required to run the [dspace]/bin/dspace xoai compile-items
command periodically. You can add the following task to your crontab:
0 3 * * * [dspace]/bin/dspace oai compile-items
Note that [dspace]
should be replaced by the correct value, that is, the value defined in dspace.cfg
parameter dspace.dir
.
Client-side stylesheet
The OAI-PMH response is an XML file. While OAI-PMH is primarily used by harvesting tools and usually not directly by humans, sometimes it can be useful to look at the OAI-PMH requests directly - usually when setting it up for the first time or to verify any changes you make. For these cases, XOAI provides an XSLT stylesheet to transform the response XML to a nice looking, human-readable and interactive HTML. The stylesheet is linked from the XML response and the transformation takes place in the user's browser (this requires a recent browser, older browsers will only display the XML directly). Most automated tools are interested only in the XML file itself and will not perform the transformation. If you want, you can change which stylesheet will be used by placing it into the [dspace]/webapps/oai/static
directory (or into the [dspace-src]/dspace-xoai/dspace-xoai-webapp/src/main/webapp/static
after which you have to rebuild DSpace), modifying the "stylesheet" attribute of the "Configuration" element in [dspace]/config/crosswalks/oai/xoai.xml
and restarting your servlet container.
Metadata Formats
By default OAI 2.0 provides 12 metadata formats within the /request context:
- OAI_DC
- DIDL
- DIM
- ETDMS
- METS
- MODS
- OAI-ORE
- QDC
- RDF
- MARC
- UKETD_DC
- XOAI
At /driver context it provdes:
- OAI_DC
- DIDL
- METS
And at /openaire context it provides:
- OAI_DC
- METS
Configuration
Basic Configuration
Configuration File: |
|
---|---|
Property: |
|
Example Value: |
|
Information Note: | This allows to choose the OAI data source between solr and database |
Property: |
|
Example Value: |
|
Informational Note: | Solr Server location |
Property: |
|
Example Value: |
|
Informational Note: | OAI persistent identifier prefix. Format - oai:PREFIX:HANDLE |
Property: |
|
Example Value: |
|
Informational Note: | Configuration directory, used by XOAI (core library). Contains xoai.xml, metadata format XSLTs and transformer XSLTs. |
Property: |
|
Example Value: |
|
Informational Note: | Directory to store runtime generated files (for caching purposes). |
Advanced Configuration
OAI 2.0 provides an advanced configuration allowing you to configure:
- Contexts
- Transformers
- Metadata Formats
- Filters
- Sets
It's an XML file commonly located at: [dspace]/config/crosswalks/oai/xoai.xml
Add/Remove Metadata Formats
Each context could have its own metadata formats. So to add/remove metadata formats to/from it, just need add/remove its reference within xoai.xml, for example, imagine one need to remove the XOAI schema from:
<Context baseurl="request"> <Format refid="oaidc" /> <Format refid="mets" /> <Format refid="xoai" /> <Format refid="didl" /> <Format refid="dim" /> <Format refid="ore" /> <Format refid="rdf" /> <Format refid="etdms" /> <Format refid="mods" /> <Format refid="qdc" /> <Format refid="marc" /> <Format refid="uketd_dc" /> </Context>
Then one would have:
<Context baseurl="request"> <Format refid="oaidc" /> <Format refid="mets" /> <Format refid="didl" /> <Format refid="dim" /> <Format refid="ore" /> <Format refid="rdf" /> <Format refid="etdms" /> <Format refid="mods" /> <Format refid="qdc" /> <Format refid="marc" /> <Format refid="uketd_dc" /> </Context>
It is also possible to create new metadata format by creating a specific XSLT for it. All already defined XSLT for DSpace can be found in the [dspace]/config/crosswalks/oai/metadataFormats directory. So after producing a new one, add the following information (location marked using brackets) inside the <Formats> element in [dspace]/config/modules/oai/xoai.xml:
<Format id="[IDENTIFIER]"> <Prefix>[PREFIX]</Prefix> <XSLT>metadataFormats/[XSLT]</XSLT> <Namespace>[NAMESPACE]</Namespace> <SchemaLocation>[SCHEMA_LOCATION]</SchemaLocation> </Format>
where:
Parameter | Description |
---|---|
IDENTIFIER | The identifier used within context configurations to reference this specific format, must be unique within all Metadata Formats available. |
PREFIX | The prefix used in OAI interface (metadataPrefix=PREFIX). |
XSLT | The name of the XSLT file within [dspace]/config/crosswalks/oai/metadataFormats directory |
NAMESPACE | XML Default Namespace of the created Schema |
SCHEMA_LOCATION | URI Location of the XSD of the created Schema |
NOTE: Changes in [dspace]/config/crosswalks/oai/xoai.xml requires reloading/restarting the servlet container.
Relevant Links
- Download & Install OAI 2.0 for DSpace 1.8.x: http://www.lyncode.com/dspace/addons/xoai/