Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleRelevant Links

Introduction

Open Archives Initiative Protocol for Metadata Harvesting is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.

What is OAI 2.0?

OAI 2.0 is a Java implementation of an OAI-PMH data provider interface developed by Lyncode that uses XOAI, an OAI-PMH Java Library.

Why OAI 2.0?

Projects like OpenAIRE, Driver have specific metadata requirements (to the published content through the OAI-PMH interface). As the OAI-PMH protocol doesn't establish any frame to these specifics, OAI 2.0 can, in a simple way, have more than one instance of an OAI interface (feature provided by the XOAI core library) so one could define an interface for each project. That is the main purpose, although, OAI 2.0 allows much more than that.

Concepts (XOAI Core Library)

To understand how XOAI works, one must understand the concept of Filter, Transformer and Context. With a Filter it is possible to select information from the data source. A Transformer allows one to make some changes in the metadata before showing it in the OAI interface. XOAI also adds a new concept to the OAI-PMH basic specification, the concept of context. A context is identified in the URL:

...

To implement an OAI interface from the XOAI core library, one just need to implement the datasource interface.

...

OAI 2.0

OAI 2.0 is a separate webapp which is a complete substitute for the old "oai" webapp. OAI 2.0 has a configurable data source, by default it will not query the DSpace SQL database at the time of the OAI-PMH request. Instead, it keeps the required metadata in its Solr index (currently in a separate "oai" Solr core) and serves it from there. It's also possible to set OAI 2.0 to only use the database for querying purposes if necessary, but this decreases performance significantly. Furthermore, it caches the requests, so doing the same query repeatedly is very fast. In addition to that it also compiles DSpace items to make uncached responses much faster.

Details about OAI 2.0 internals can be found here.

Using Solr

OAI 2.0 uses the Solr data source by default.

The Solr index can be updated at your convenience, depending on how fresh you need the information to be. Typically, the administrator sets up a nightly cron job to update the Solr index from the SQL database.

OAI Manager (Solr Data Source)

OAI manager is a utility that allows one to do certain administrative operations with OAI. 

...

  • -o Optimize index after indexing
  • -c Clears the Solr index before indexing (it will import all items again)
  • -v Verbose output
  • -h Shows an help text

Scheduled Tasks

In order to refresh the OAI Solr index, it is required to run the [dspace]/bin/dspace oai import command periodically. You can add the following task to your crontab:

...

Note that [dspace] should be replaced by the correct value, that is, the value defined in dspace.cfg parameter dspace.dir.

Using Database

OAI 2.0 could also work using the database for querying. To configure that one must change the [dspace]/config/modules/xoai.cfg file, specifically parameter 'storage', setting it to database. This decreases performance significantly and likely has no other benefits than leaving out Solr as a dependency.

OAI Manager (Database Data Source)

OAI manager is a utility that allows one to do some administrative operations with OAI. 

...

  • -v Verbose output
  • -h Shows an help text

Scheduled Tasks

In order to refresh the OAI cache and compile DSpace items (for fast responses), it is required to run the [dspace]/bin/dspace xoai compile-items command periodically. You can add the following task to your crontab:

...

Note that [dspace] should be replaced by the correct value, that is, the value defined in dspace.cfg parameter dspace.dir.

Client-side stylesheet

The OAI-PMH response is an XML file. While OAI-PMH is primarily used by harvesting tools and usually not directly by humans, sometimes it can be useful to look at the OAI-PMH requests directly - usually when setting it up for the first time or to verify any changes you make. For these cases, XOAI provides an XSLT stylesheet to transform the response XML to a nice looking, human-readable and interactive HTML. The stylesheet is linked from the XML response and the transformation takes place in the user's browser (this requires a recent browser, older browsers will only display the XML directly). Most automated tools are interested only in the XML file itself and will not perform the transformation. If you want, you can change which stylesheet will be used by placing it into the [dspace]/webapps/xoai/static directory (or into the [dspace-src]/dspace-xoai/dspace-xoai-webapp/src/main/webapp/static after which you have to rebuild DSpace), modifying the "stylesheet" attribute of the "Configuration" element in [dspace]/config/modules/xoai/xoai.xml and restarting your servlet container.

Metadata Formats

By default OAI 2.0 provides 12 metadata formats within the /request context:

...

And at /openaire context it provides:

  1. OAI_DC
  2. METS

Configuration

Basic Configuration

Configuration File:

[dspace]/config/modules/oai.cfg

Property:

storage

Example Value:

storage = solr

Information Note:

This allows to choose the OAI data source between solr and database

Property:

solr.url

Example Value:

solr.url = ${default.solr.server}/oai

Informational Note:

Solr Server location

Property:

identifier.prefix

Example Value:

identifier.prefix = ${dspace.hostname}

Informational Note:

OAI persistent identifier prefix. Format - oai:PREFIX:HANDLE

Property:

config.dir

Example Value:

config.dir = ${dspace.dir}/config/modules/oai

Informational Note:

Configuration directory, used by XOAI (core library). Contains xoai.xml, metadata format XSLTs and transformer XSLTs.

Property:

cache.dir

Example Value:

cache.dir = ${dspace.dir}/var/oai

Informational Note:

Directory to store runtime generated files (for caching purposes).

Advanced Configuration

OAI 2.0 provides an advanced configuration allowing you to configure:

...

It's an XML file commonly located at: [dspace]/config/modules/oai/xoai.xml

Add/Remove Metadata Formats

Each context could have its own metadata formats. So to add/remove metadata formats to/from it, just need add/remove its reference within xoai.xml, for example, imagine one need to remove the XOAI schema from:

...