Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
titleORCID API version compatibility

Please note that ORCID API 1.2 was turned off on August 1, 2018. To use the current ORCID API 2.x, you will need DSpace 5.9 or DSpace 6.3 or newer. Details: DS-3447.

Table of Contents
minLevel2
outlinetrue
stylenone

Status
colourYellow
titleWork in progress

...

Introduction

The ORCID The ORCID integration adds ORCID compatibility to the existing solutions for Authority control in DSpace. String names of authors are still being stored in DSpace metadata. The authority key field is leveraged to store a uniquely generated internal ID that links the author to more extended metadata, including the ORCID ID and alternative author names.

This extended metadata is stored and managed in a dedicated SOLR index, the DSpace authority cache.

Info
titleTimeline

This functionality is still under development and is scheduled to be contributed as part of the DSpace 5 release. See PR#612.

Tasklist
titleChecklist: this is when you're done
enableLockingtrue
||Completed||Priority||Locked||CreatedDate||CompletedDate||Assignee||Name||
|F|M|T|1389620440889|          |stuart.yeates@vuw.ac.nz|Use Case and high level benefits. Can someone with limited technical background understand what this is about?|
|T|M|T|1389620638059|1412336317233|antoinesnyers|Step by step how to use. Can someone with limited technical background use the feature?|
|T|M|T|1389620491621|1412336461832|antoinesnyers|Technical implementation details. Did you provide enough details for other developers to add or extend on your work?|
|T|M|T|1389620524789|1412336325525|antoinesnyers|Configuration. Did you describe which aspects of your contribution can be configured and where configuration happens?|
|F|M|T|1389620576924|          |bram|Template text cleanup. Have you removed the template text that was initially included on this page?|
|T|M|T|1389620841679|1407497211523|bram|Installation details. If your functionality will not be enabled by default in DSpace, provide details on how the functionality can be installed or enabled.|
|F|M|F|1412336500911|          |antoinesnyers|add screenshots from mirage 2|

 

Use case and high level benefits

The vision behind this project consists of the following two aspects:

Lowering the threshold to adopt ORCID for the members of the DSpace community

ORCID’s API has enabled developers across the globe to build points of integration between ORCID and third party applications. Up until today, this meant that members of the DSpace community were still required to implement front-end and back-end modifications to the DSpace source code in order to leverage these APIs. As DSpace aims to provide turnkey Institutional Repository functionality, the platform is expected to provide more functionality out of the box. Only an elite selection of members in the DSpace community has software development resources readily available to implement this kind of functionality. By contributing a solution directly to the core DSpace codebase, this threshold to adopt ORCID functionality in DSpace repositories is effectively lowered. The ultimate goal is to allow easy adoption of ORCID without customization of the DSpace software, by allowing repository administrators to enable or disable functionality by means of user friendly configuration.

Address generic use cases with appealing end user functionality

This proposal aims to provide user friendly features for both repository administrators as well as non- technical end users of the system. The addition of ORCID functionality to DSpace should not come at the cost of making the system more difficult for administrators and end users to use. Scope With this vision in mind, the project partners wanted to tackle the first phases for repository managers of existing DSpace repositories: ensuring that ORCIDs are properly associated with new works entering the system, as well as providing functionality to efficiently batch-update content already existing in the system, with unambiguous author identity information.

Enabling the ORCID authority control

Warning
titleJSPUI Support

In DSpace 5.0 the functionality only includes user interface functionality for the DSpace XML User Interface. 

Warning
titleXMLUI Theme Support

In DSpace 5.0 the functionality only adds support for the XMLUI Mirage and Mirage 2 themes. Older XMLUI themes including Kubrick, Reference and Classic are currently unsupported.

If you wish to enable this feature, some changes are required to the dspace.cfg file. The first step is to activate the authority as a valid option for authority control, this is done by adding/setting an additional plugin in the  plugin.named.org.dspace.content.authority.ChoiceAuthority property. An example of this can be found below.

Code Block
plugin.named.org.dspace.content.authority.ChoiceAuthority = \
    org.dspace.content.authority.SolrAuthority = SolrAuthorAuthority

The feature relies on the following configuration parameters in dspace.cfg. To activate the default settings it suffices to remove the comment hashes ("#") for the following lines. See the section at the bottom of this page what these parameters mean exactly and how you can tweak the configuration.

Code Block
solr.authority.server=${solr.server}/authority
choices.plugin.dc.contributor.author = SolrAuthorAuthority
choices.presentation.dc.contributor.author = authorLookup
authority.controlled.dc.contributor.author = true
authority.author.indexer.field.1=dc.contributor.author

The final part of configuration is to add the authority consumer in front of the list of event consumers. Add "authority" in front of the list as displayed below.

Code Block
event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester

Importing existing authors & keeping the index up to date

When first enabled the authority index will be empty, to populate the authority index run the following script:

Code Block
[dspace]/bin/dspace index-authority

Use case and high level benefits

The vision behind this project consists of the following two aspects:

Lowering the threshold to adopt ORCID for the members of the DSpace community

ORCID’s API has enabled developers across the globe to build points of integration between ORCID and third party applications. Up until today, this meant that members of the DSpace community were still required to implement front-end and back-end modifications to the DSpace source code in order to leverage these APIs. As DSpace aims to provide turnkey Institutional Repository functionality, the platform is expected to provide more functionality out of the box. Only an elite selection of members in the DSpace community has software development resources readily available to implement this kind of functionality. By contributing a solution directly to the core DSpace codebase, this threshold to adopt ORCID functionality in DSpace repositories is effectively lowered. The ultimate goal is to allow easy adoption of ORCID without customization of the DSpace software, by allowing repository administrators to enable or disable functionality by means of user friendly configuration.

Address generic use cases with appealing end user functionality

This proposal aims to provide user friendly features for both repository administrators as well as non- technical end users of the system. The addition of ORCID functionality to DSpace should not come at the cost of making the system more difficult for administrators and end users to use. Scope With this vision in mind, the project partners wanted to tackle the first phases for repository managers of existing DSpace repositories: ensuring that ORCIDs are properly associated with new works entering the system, as well as providing functionality to efficiently batch-update content already existing in the system, with unambiguous author identity information.

Enabling the ORCID authority control

Warning
titleUI Support

In DSpace 5.0 the functionality only includes user interface functionality for the XMLUI Mirage and Mirage 2 themes. Older XMLUI themes including Kubrick, Reference and Classic are currently unsupported. JSPUI is unsupported in 5.0 as well.

If you wish to enable this feature, some changes are required to the dspace.cfg file. The first step is to activate the authority as a valid option for authority control, this is done by adding/setting an additional plugin in the  plugin.named.org.dspace.content.authority.ChoiceAuthority property. An example of this can be found below.

Code Block
languagetext
plugin.named.org.dspace.content.authority.ChoiceAuthority = \
    org.dspace.content.authority.SolrAuthority = SolrAuthorAuthority

The feature relies on the following configuration parameters in dspace.cfg. To activate the default settings it suffices to remove the comment hashes ("#") for the following lines. See the section at the bottom of this page what these parameters mean exactly and how you can tweak the configuration.

Code Block
languagetext
solr.authority.server=${solr.server}/authority
choices.plugin.dc.contributor.author = SolrAuthorAuthority
choices.presentation.dc.contributor.author = authorLookup
authority.controlled.dc.contributor.author = true
authority.author.indexer.field.1=dc.contributor.author

The final part of configuration is to add the authority consumer in front of the list of event consumers. Add "authority" in front of the list as displayed below.

Code Block
languagetext
event.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester

Importing existing authors & keeping the index up to date

When first enabled the authority index will be empty, to populate the authority index run the following script:

Code Block
languagebash
[dspace]/bin/dspace index-authority

This will iterate over every metadata under authority control and create records of them in the authority index. The metadata without an authority key will each be updated with an auto generated authority key. These will not be matched in any way with other existing records. The metadata with an authority key that does not already exist in the index will be indexed with those authority keys. The metadata with an authority key that already exist in the index will be re-indexed the same way. These records remain unchanged.

Different possible use cases for Index-authority script

Metadata value WITHOUT authority key in metadata

“Luyten, Bram” is present in the metadata without any authority key.
GOAL: “Luyten, Bram” gets added in the cache ONCE

All occurences of “Luyten, Bram” in the DSpace item metadata will become linked with the same generated uid.

Metadata that already has an authority key from an external source (NOT auto-generated by DSpace)

“Snyers, Antoine” is present with authority key “u12345”

The old authority key needs to be preserved in the item metadata and duplicated in the cache.
“u12345” will be copied to the authority cache and used as the authority key there.

Metadata that has already a new dspace generated uid authority key

Item metadata already contains an author with name “Haak, Danielle” and a uid in the authority field 3dda2571-6be8-4102-a47b-5748531ae286

This uid is preserved and no new record is being created in the authority index.

Processing on records in the authority cache

Running this script again will update the index and keep the index clean. For Running this script again will update the index again will keep the index clean, for example if an author occurs in a single item and that item is deleted the script will need to be run again to remove it from the index. When run again it will remove all records that no longer have a link to existing authors in the database.

Usage in DSpace


Submission Submission of new DSpace items - Author lookup

The submissions forms have not changed much. The only thing you can notice is an extra button next to the input fields for the author names. Next to the Add button, which is common for all repeatable fields, there is the Lookup & Add button.

Image RemovedImage Added

It's by clicking on that button that the Look-up User Interface appears. If an author name was filled in but not added yet, the Lookup User Interface will immediately perform a search for that name. Otherwise the search field remains empty and a list of known authors is displayed. The list of authors is updated as you type in the search box.

...

Clicking the Lookup button brings back the Lookup User Interface. This works just the same way as in the submission forms.

Image RemovedImage Added

Editing existing items using Batch CSV Editing

...

For each of the ORCID authors a lookup will be done and their names will be added to the metadata. All the non-ORCID authors will be added as well. The authority keys and solr records are added when the reported changes are applied.

 


Storage of related metadata

ORCID authorities not only link a digital identifier to a name. It regroups a load of metadata going from alternative names and email addresses to keywords about their works and much more. The metadata is obtained by querying the ORCID web services. In order to avoid querying the ORCID web services every time for some information, all these related metadata is gathered in a "metadata authority cache" that DSpace can access directly.

In practice the cache is provided by an apache solr server. When a look-up is made and an author is chosen that is not yet in the cache, a record is created from an ORCID profile and added to the cache with the list of related metadata. The value of the Dublin Core metadata is based on the first and last name as they are set in the ORCID profile. The authority key for this value links directly to the solr document's id. DSpace does not provide a way to edit these records manually.

The information in the authority cache can be updated by running the following command line operation:

Command used:
[dspace]/bin/dspace dsrun org.dspace.authority.UpdateAuthorities
Argumentsdescription
-iupdate specific solr records with the given internal ids (comma-separated)
-hprints this help message

This will iterate over every solr record currently in use (unless the -i argument is provided), query the ORCID web service for the latest data and update the information in the cache. If configured, the script will also update the metadata of the items in the repository where applicable.

The configuration property can be set in config/modules/solrauthority.cfg:

Code Block
languagetext
auto-update-items = false | true

...

In the Enabling the ORCID authority control control section, you have been told to add this block of configuration.

Code Block
languagetext
solr.authority.server=${solr.server}/authority
choices.plugin.dc.contributor.author = SolrAuthorAuthority
choices.presentation.dc.contributor.author = authorLookup
authority.controlled.dc.contributor.author = true
authority.author.indexer.field.1=dc.contributor.author

...

  • With the authority.controlled property every metadata field that needs to be authority controlled is configured. This involves every type of authority control, not only the fields for ORCID integration.
  • The choices.plugin should be configured for each metadata field under authority control. Setting the value on SolrAuthorAuthority tells DSpace to use the solr authority cache for this metadatafield, cfr. Storage of related metadata.
  • The choices.presention should be configured for each metadata field as well. The traditional values for this property are select|suggest|lookup. A new value, authorLookup, has been added to be used in combination with the SolrAuthorAuthority choices plugin. While the other values can still be used, the authorLookup provides a richer user interface in the form of a popup on the submission page.
  • The browse indexes need to point to the new authority-controlled index: webui.browse.index.2 = author:metadata:dc.contributor.*,dc.creator:text should become webui.browse.index.2 = author:metadataAuthority:dc.contributor.author:authority
  • More existing configuration properties are available but their More existing configuration properties are available but their values are independent of this feature and their default values are usually fine: choices.closed , authority.required, authority.minconfidence .

For the cache update script, one property can be set in config/modules/solrauthority.cfg:

Code Block
languagetext
auto-update-items = false | true

...

The final part of configuration is to add the authority consumer in front of the list of event consumers. Add "authority" in front of the list as displayed below.

Code Block
languagetext
eventevent.dispatcher.default.consumers = authority, versioning, discovery, eperson, harvester

...

Adding additional fields under ORCID

Other metadata fields besides "dc.contributor.author" can benefit from the ORCID authority control at the same time. Here is an example of how to get the same ORCID functionality for the "dc.contributor.editor" metadata field assuming that "dc.contributor.author" is already configured correctly. It can be achieved by modifying configuration files only.

...

To fix this, open the file at config/spring/api/orcid-authority-services.xml and find this spring bean:

Code Block
<bean name="AuthorityTypes" class="org.dspace.authority.AuthorityTypes">
    <property name="types">
        <list>
            <bean class="org.dspace.authority.orcid.OrcidAuthorityValue"/>
            <bean class="org.dspace.authority.PersonAuthorityValue"/>
        </list>
    </property>
    <property name="fieldDefaults">
        <map>
            <entry key="dc_contributor_author">
            <entry key="dc_contributor_author">
                <bean class="org.dspace.authority.PersonAuthorityValue"/>
            </entry>
        </map>
    </property>
</bean>

...

                <bean class="org.dspace.authority.PersonAuthorityValue"/>
            </entry>
        </map>
    </property>
</bean>


The map inside the "fieldDefaults" property needs an additional entry for the editor field:

Code Block
<entry key="dc_contributor_editor">
    <bean class="org.dspace.authority.PersonAuthorityValue"/>
</entry>

With this last change everything is set up to work correctly. The rest of this configuration file is meant for JAVA developers that wish to provide integration with other systems beside ORCID. Developers that wish to display other fields than first and last name can also have a look in that section.

Note: Each metadata field has a separate set of authority records. Authority keys are not shared between different metadata fields. E. g. multiple dc.contributor.author can have the same authority key and point to the same authority record in the cache. But when an ORCID is chosen for a dc.contributor.editor field, a separate record is made in the cache. Both records are updated from the same source and will contain the same information. The difference is that when performing a look-up of a person that has been introduced as an authority for an author field but not yet as an editor, it will show as record that is not yet present in the repository cache.

Integration with other systems beside ORCID

The authority cache and look-up functionality can be extended to use other sources than ORCID or to show more information in the look-up interface. However some JAVA development is necessary for this. Specific instructions can be found in the readme file of the org.dspace.authority package.

FAQ

Which information from ORCID is currently indexed in the authority cache?

Here is a breakdown of the fields stored in the solr cache.

The system/dspace related fields are: id, field, value, deleted, creation_date, last_modified_date, authority_type.

The fields with data coming directly from ORCID are: first_name, last_name, name_variant, orcid_id, label_researcher_url, label_keyword, label_external_identifier, label_biography, label_country. The field all_labels contains all the values from the other fields starting with "label_".

How can I index additional fields in the authority cache?

There is currently no configuration to control which fields are indexed. The only way to achieve this is to modify the source code.

List of the files at work for this job:
config/spring/api/orcid-authority-services.xml: OrcidSource contains the URL for orcid's REST API.
org.dspace.authority.orcid.Orcid makes the REST call
+ org.dspace.authority.orcid.xml.XMLtoBio converts the received XML to a java object (Bio).
+ org.dspace.authority.orcid.model.Bio
+ org.dspace.authority.orcid.OrcidAuthorityValue#create(org.dspace.authority.orcid.model.Bio) inserts all the values from Bio into the AuthorityValue subclass.
+ org.dspace.authority.orcid.OrcidAuthorityValue#getSolrInputDocument defines what's included in solr.

The files preceded with a '+' would be necessary to modify to add more info into the cache.

How can I use the information stored in the authority cache?

The look-up UI is currently the only place this information is sent to. However just a limited number of fields are sent. The place in the source code to modify to get more fields there is org.dspace.authority.orcid.OrcidAuthorityValue#choiceSelectMap. This is also documented in the readme of the org.dspace.authority package.

How to add additional metadata fields in the authority cache that are not related to ORCID?

Make the same configuration step as for adding additional fields under ORCID. Currently the ORCID suggestions cannot be turned off for specific fields, that would require custom code.

What happens to data if another authority control was already present?

As long as the metadata does not get indexed, there will be no changes. However every time any metadata of an item is modified, the metadata under authority control for that item will be re-indexed. When that happens a record will be inserted in the solr cache. That record's ID will be the authority key of the metadata. This can be done for all metadata at once with the index-authority script.

In short: authority keys that exist prior to enabling the solrauthority are kept. They just won't show in the look-up until they are indexed.

Where can I find the URL that is used to lookup ORCIDs?

It is found in the config/spring/api/orcid-authority-services.xml configuration file. Look for the <bean name="OrcidSource">, which is initialized with a URL of http://pub.orcid.org 

Code Block
<entry key="dc_contributor_editor">
    <bean class="org.dspace.authority.PersonAuthorityValue"/>
</entry>

With this last change everything is set up to work correctly. The rest of this configuration file is meant for JAVA developers that wish to provide integration with other systems beside ORCID. Developers that wish to display other fields than first and last name can also have a look in that section.

Note: Each metadata field has a separate set of authority records. Authority keys are not shared between different metadata fields. E. g. multiple dc.contributor.author can have the same authority key and point to the same authority record in the cache. But when an ORCID is chosen for a dc.contributor.editor field, a separate record is made in the cache. Both records are updated from the same source and will contain the same information. The difference is that when performing a look-up of a person that has been introduced as an authority for an author field but not yet as an editor, it will show as record that is not yet present in the repository cache.

Integration with other systems beside ORCID

The authority cache and look-up functionality can be extended to use other sources than ORCID or to show more information in the look-up interface. However some JAVA development is necessary for this. Specific instructions can be found in the readme file of the org.dspace.authority package.

FAQ

How to add additional fields in the authority cache that are not related to ORCID?

Make the same configuration step as for adding additional fields under ORCID. Currently the ORCID suggestions cannot be turned off for specific fields, that would require custom code.

What happens to data if another authority control was already present?

As long as the metadata does not get indexed, there will be no changes. However every time any metadata of an item is modified, the metadata under authority control for that item will be reindexed. When that happens a record will be inserted in the solr cache and the authority key will be replaced by the internal id of that record. The metadata value stays the same, but the previous authority key will be lost.