Unreleased Documentation

This documentation is unreleased and still in development. It may describe features which are not yet released in DSpace.
Looking for another version? See all documentation

Data Correction

Scenario : A repository manager of a repository indexed in OpenAIRE can subscribe the event for Missed/More PIDs and Project links in the Content Provider Dashboard using “a repository callback” as notification mechanism instead of the current email alert. They login in the repository and see the list of events received, among others one publication that has a PMID that was unknown to the repository and a link to a project. They click on the “accept the suggestion” button and the new information is stored in the local record. OpenAIRE could “flag” the data as confirmed.

The goal of the Data Correction service is to support the scenario above.

As the OpenAIRE Content Provider Dashboard doesn't allow yet to create a subscription setting up a callback mechanism, we agree with the OpenAIRE team to read the data generated by openAIRE's Notification Broker Service from a JSON file postponing to the last phase of the project the discussion and the implementation about the delivery mechanism (polling new versions from a stable URL, receive it as payload of a repository URL, etc.).

Data source

The JSON file contains an array of JSON Events, where each event has the following structure

    {
        "originalId": "oai:www.openstarts.units.it:10077/21838",
        "title": "Egypt, crossroad of translations and literary interweavings (3rd-6th centuries). A reconsideration of earlier Coptic literature",
        "topic": "ENRICH/MORE/PROJECT",
        "trust": 1.0,
        "message": {
            "projects[0].acronym": "PAThs",
            "projects[0].code": "687567",
            "projects[0].funder": "EC",
            "projects[0].fundingProgram": "H2020",
            "projects[0].jurisdiction": "EU",
            "projects[0].openaireId": "40|corda__h2020::6e32f5eb912688f2424c68b851483ea4",
            "projects[0].title": "Tracking Papyrus and Parchment Paths: An Archaeological Atlas of Coptic Literature.\nLiterary Texts in their Geographical Context: Production, Copying, Usage, Dissemination and Storage"
        }
    }

please note that the message sub-object depends on the event TOPIC. A more complete set of sample events can be seen here: qaevents-sample.json

The java class org.dspace.app.qaevent.qaeventsRunnableCli provides a convenient method to process this json file loading the data in a dedicated new DSpace SOLR Core named qaevent, to use it run from the dspace installation bin folder

./dspace import-qaevents -f <path-to-the-json-file>

the same script is also available via the administrative runnable process UI



The config/modules/qaevents.cfg file allows to configure witch Topic should be processed, indeed some Topics could have no configured action on the repository

qaevents.openaire.import.topic = ENRICH/MISSING/ABSTRACT
qaevents.openaire.import.topic = ENRICH/MORE/PID
qaevents.openaire.import.topic = ...

and a list of URLs to acknowledge the decision made by the Repository Manager via the DSpace UI

qaevents.openaire.acknowledge-url = https://httpdump.io/...

Such configuration file is also expected in future to hold settings related to the delivery mechanism (such as the URL from where the json file can be download, the credentials to use, etc.)

The qaevent core has the following structure

<fields>
    <field name="event_id" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="original_id" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="title" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="topic" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="trust" type="double" indexed="true" stored="true" omitNorms="true" />
    <field name="message" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="resource_uuid" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="related_uuid" type="string" indexed="true" stored="true" omitNorms="true" />
    <field name="last_update" type="date" indexed="true" stored="true" omitNorms="true" />
</fields>    
<uniqueKey>event_id</uniqueKey>

the event_id is currently generated on the repository side as an hash of the business information included in the event itself but it is envisioned that such information will be made available by openAIRE directly in the json file so that feedback from the Repository can linked back to the original event and further processed.

The related_uuid field contains the uuid of the related object that has been associated with the correction suggestion, this is the case for the PROJECT related TOPICS where a link between the publication and a project should be established. In the case the suggested project can be found in the system, the related_uuid field will hold its internal identifier otherwise the user will be allowed to created on the fly a new item also for the project and connect it to the publication item with a single click.

Two REST endpoints have been developed to expose the data so collected

  • /api/integration/qualityassurancetopic to provide access to summary information about the available topic and number of events to deal with
  • /api/integration/qualityassuranceevent to provide access to the detailed events so that they can be reviewed and managed by the repository manager

The detailed REST contract for such endpoints are available on the 4Science Rest7Contract repository and embedded at the bottom of the page for easy reference.

Repository Manager UI

The resulting UI is accessible from the administrative menu - if the configuration key is true: qaevents.enabled (it can be found onto qaevents.cfg file and is defaulted as false). As entry point for the features a “Notifications” menu entry has been added to the DSpace administrative menu, from where the repository manager will be able to manage the OpenAIRE subscription and access the details of received events.


The main page list the topics found in the events loaded in the system

By default the system sort the events within a topic by trust descending (most accurate correction first)

but it is also possible to revert the direction


In the detail view of events in a specific topic links always open in a new tab so that the repository manager can quickly check the details without loosing the overview

Below a screen of possible missing abstract events, where the repository manager will be able to check the current local publication record clicking on the title and scroll the abstract reported by OpenAIRE. Accepting the suggestion, the local record will be enriched with this extra information. The Ignore suggestions button is instead intent to be used to discard a notification without flagging it as wrong. This is important because the OpenAIRE Graph process the data from the repository not in real-time so it can happen that a local record has been updated recently with information not yet known to OpenAIRE. In such scenarios it could be possible that the repository manager prefers to keep the local version but this should be not reported to OpenAIRE a wrong suggestion as this feedback can be used to improve the OpenAIRE guessing capabilities. In contrast a wrong suggestion should be rejected so that OpenAIRE can learn from that.


For PROJECT related events, alternative additional actions are needed. This is usually the case for information that is related to linked entities that can be tracked on the local repository as flat metadata (in such case the “abstract approach UI” will be used) or as individual entity. In this later case the below screen applies:

The system will attempt to identify a local record for the information reported by OpenAIRE (the project) and will offer to the repository manager the option to manually lookup the record or fix the automatic match


if the related project is found in the system the repository manager can proceed to accept the correction linking the publication to the local copy of the project otherwise it is possible to import and connect the project in one click as shown in the first project related screen above



For PID related events, the system offers where available (doi, handle, pmid, pmc, arXiv, NCID, urn/url) the resolution of the identifier to a details page

  • No labels