Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Code can be found in our DSpace repository on githubGitHub, in the DOI branch DOI: https://github.com/tuub/DSpace/tree/DOI.

...

We (at Technische Universität Berlin) want to use DOIs for Items within DSpace. We are thinking about using DOIs for Communities and Collections, but at first we'll concentrate on items. A DOI is a well known persistent identifier and with . With the external identifier support atMire @mire introduced to DSpace 3.0 with the item versioning support feature it should be possible to add support to mint, register and delete DOIs using DSpace.

...

To register a DOI one has to make enter into a contract with a DOI registration agency, several agencies exists. Several such agencies exist. Different DOI registration agencies have different rulespolicies. Some offers registration of them offer DOI specially registration specially or only for academic environment, others only for publishing companies. Most of the registration agencies take charge fees for registering DOIs, all of them have different rules describing for what kind of item a DOI can be registered. To implement DOI support for DataCite we have to take care be mindful of the fact that every registration agency has their own API (see below).

DataCite is an organization that aims to support the access to, the acceptance of and the archiving of research data. On One of the services offered by DataCite members is to register DOIsDOI registration. DataCite has several members that act as a DOI registration agency. Some of the members tells tell their customers to use the API of DataCite directly, others offers offer their own APIs. So to register a DOI at a member of DataCite does not automatically means mean to use DataCites API directly.

We will register our DOIs using the service of TIB Hannover, a german German member of DataCite. We will use the DataCite API directly. EZID is a DOI registration agency in the U.S. USA that is although also part of DataCite. EZID offers their own API, so that EZID customers one profit directly from our development.

...

Knowing this situation we developed a DOIIdentifierProvider that should perform everything on the side of DSpace that is necessary to support DOIsDOIs on the DSpace side. For example, after minting and registering a DOI it safes saves the DOI as a metadata value of an Item. To be able to extend our DOIIdentifierProvider, we put a DOIConnector between our DOIIdentifierProvider and the API for the registration agencyagency API. The DOIConnector has to support implement seven methods and should be quite easy to implement for any API of a DOI registration agency. The seven methods are:

  • one a method to check if a DOI is already reserved,
  • one a method to check if a DOI is reserved for a given DSO,
  • one a method to check if a DOI is already registered,
  • one a method to check if a DOI is registered for a given DSO,
  • one a method to reserve a DOI for a given DSO,
  • one a method to register a DOI for a given DSO,
  • one a method to delete a DOI for a given DSO.

We already developed a DataCiteConnector that implements these methods for everyone that who uses the DataCite API directly. As told above, EZID has their own API, but it should be quite simple to implement a DOIConnector providing these seven methods with the EZID API.

...

DataCite wants to get metadata of the objects the DOIs addresses. The DataCite Schema (http://schema.datacite.org) defines a an XML structure to describe the metadata of an object. We developed a DIM2DataCite crosswalk that takes the metadata of a DSpace Item and transforms it into a XML using DataCite Schema 2.2. As far as I know, EZID does not use this XML so that probably another crosswalk is neededprobably needed. It should be discussed (see below or in the JIRA ticket) how we want to deal with metadata updates, as the API for external identifiers does not define a mechanism to update metadata for an external identifer identifier yet.

How to Test

To test our code, you will need a login to be able to use the DataCite Test API. The test system can be found here: https://test.datacite.org. This URL is currently used for the API in dspace/config/spring/api/identieridentifier-service.xml. In this file you have to remove to comments around the bean for the DOIIdentifier and arround around the bean for the DataCiteConnector. In dspace.cfg you'll have to configure the properties identifier.doi.user, identifier.doi.password, identifier.doi.prefix and idnetifier identifier.doi.namespaceseparator. As user you You can use the "TIB.DSPACE" user, as password the "duraspace" password, as Prefix 10the 10.0128 and prefix and as namespace separator a string that you don't expect anyone else to usewould be using. DOIs that gets registered with this these settings will be deleted regularly. DOIs that gets get registered with this these properties must address an item usinig using example.org as domain (or any subdomain below it). So you have to configure dspace.url to include exmpale example.org as domain (sorry, that is a rule of our DOI registration agency)!

Status

The DSpace wiki tells as recommends us to get in touch with the developer community early. A first version of a DOIIdentivierProvider is now complete. An interface for a DOIConnector is defined. We were able to reserve, register and delete DOIs at the test API of DataCite. All the code can be found in our DSpace repository on githubGitHub, in the branch DOIDOI branch: https://github.com/tuub/DSpace/tree/DOI.

What's still to

...

be done?

Of cause course, documentation for the DSpace manual would be necessary if this contribution gets accepted. Although javadoc Also JavaDoc documentation could be enhanced. I should (but did not yet) write here something about several design decisions I made while implementing the DOIIdentifierProvider.

A lot of testing and possibly some debugging. It would be great if someone could implement a DOIConnector for EZID. We did not write any test classes yet (knowing that this is not good and should be changed). We currently break some tests because DataCite defines five mandatory metadata fields (the DOI, a creator, a title, a publisher and the publication year) and not all TestItems contains contain all of these.

We did not care for the frontends yet. Currently we can register DOIs but we did not care if they will be presented in any UI.

...

  • Currently the API for external identifiers does not inform a an IdentifierProvider about updated metadata. Should the DOIIdentifierProvider API be extended to take care for metadata API or does another mechanism already exists in DSpace?
  • The IdentifierProvider API allows most of the methods to throw an IdentifierException. But in our first test it seems that a thrown IdentifierException won't be caught. If the API of the DOI registration agency is down or produces errors, it is impossible to publish any Items in DSpace.

...

I want to thank Mark H. Wood who mad a made the first steps for a DOIIdentifier. His code helped me to understand how DSpace handles metadata and what should be done to support DOIs within DSpace. He started to implement a DOIIdentifierProvider using EZID and although also wrote a test class for it. I allowed my self to took the liberty to use some of his code.

The other person helping me was Fabian Fürste, a collegue here at TU Berlin. Thanks goes to him as well.

...

Beside the things above I should mention that my code adds a table to the DSpace database schema. It adds a table called "doi" in which I safe save information about the DOIs that gets get minted and registered with DSpace.