All Versions
- DSpace 7.x (Current Release)
- DSpace 8.x (Unreleased)
- DSpace 6.x (EOL)
- DSpace 5.x (EOL)
- More Versions...
Contribute to the DSpace Development Fund
The newly established DSpace Development Fund supports the development of new features prioritized by DSpace Governance. For a list of planned features see the fund wiki page.
This framework is used by both the REST API and User Interface to help enhance or enrich submissions. One examples usage is in Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, TSV, CSV) and online services (OAI, arXiv, PubMed, CrossRef, CiNii)
The importer framework does not enforce a specific input format. Each importer implementation defines which input format it expects from a remote source. The import framework uses generics to achieve this. Each importer implementation will have a type set of the record type it receives from the remote source's response. This type set will also be used by the framework to use the correct MetadataFieldMapping for a certain implementation. Read "Implementation of an import source" below for more information and how to enable the framework.
The framework produces an 'ImportRecord' that is completely decoupled from DSpace. It contains a set of metadata DTO's that contain the notion of schema,element and qualifier. The specific implementation is responsible for populating this set. It is then very simple to create a DSpace item from this list.
Each importer should extend one of the following:
org.dspace.importer.external.service.components.AbstractPlainMetadataSource
abstract class. This is an abstract implementation of MetadataSource which is useful when the source of the metadata is an uploaded file (e.g. Bibtex, CSV or similar)org.dspace.importer.external.service.AbstractImportMetadataSourceService
abstract class. This is an abstract implementation of MetadataSource which is useful when the source of the metadata is a query against an external API (e.g. PubMed, arXiv or similar)Implementing one of these abstract classes allows the importer implementation to use the framework's build-in support to transform a record received from the remote source to an object of class org.dspace.importer.external.datamodel.ImportRecord containing DSpace metadata fields, as explained below in "Metadata mapping".
Method getImportSource() should return a unique identifier. Importer implementations should not be called directly, but class org.dspace.importer.external.service.ImportService should be called instead. This class contains the same methods as the importer implementations, but with an extra parameter 'url'. This url parameter should contain the same identifier that is returned by the getImportSource() method of the importer implementation you want to use.
The other inherited methods are used to query the remote source.
At a simple level, metadata mapping configurations are all in Spring configs in [dspace.dir]/config/spring/api/
In that directory, you'll find a mapping file per import source, e.g. "arxiv-integration.xml", "bibtex-integration.xml", "endnote-integration.xml", "pubmed-integration.xml", etc.
There are two different mapping types.
The list of all of the enabled mappings can be found in a "MetadataFieldConfig" <util:map>, usually at the top of the config file.
<util:map id="bibtexMetadataFieldMap" key-type="org.dspace.importer.external.metadatamapping.MetadataFieldConfig" value-type="org.dspace.importer.external.metadatamapping.contributor.MetadataContributor"> <description>Defines which metadatum is mapped on which metadatum. Note that while the key must be unique it only matters here for postprocessing of the value. The mapped MetadatumContributor has full control over what metadatafield is generated. </description> <!-- These entry tags are the enabled mappings. The "value-ref" must map to a <bean> ID. --> <entry key-ref="dcTitle" value-ref="bibtexTitleContrib" /> <entry key-ref="dcAuthors" value-ref="bibtexAuthorsContrib" /> <entry key-ref="dcJournal" value-ref="bibtexJournalContrib" /> <entry key-ref="dcIssued" value-ref="bibtexIssuedContrib" /> <entry key-ref="dcJissn" value-ref="bibtexJissnContrib" /> </util:map>
Each field in the file is mapped to a DSpace metadata field in a "SimpleMetadataContributor" bean definition. NOTE: a large number of DSpace defined metadata fields are already configured as MetadataFieldConfig beans in the "dublincore-metadata-mapper.xml" Spring Config in the same directory. These may be reused in other configurations.
<!-- This example bean for BibTex says the "title" key in the BibTex" file should be mapped to the DSpace metadata field defined in the "dcTitle" bean. This "dcTitle" bean is found in "dublincore-metadata-mapper.xml" and obviously maps to "dc.title" --> <bean id="bibtexTitleContrib" class="org.dspace.importer.external.metadatamapping.contributor.SimpleMetadataContributor"> <property name="field" ref="dcTitle"/> <property name="key" value="title" /> </bean>
Similar to above, The list of all of the enabled mappings can be found in a "MetadataFieldConfig" <util:map>, usually at the top of the config file.
<util:map id="arxivMetadataFieldMap" key-type="org.dspace.importer.external.metadatamapping.MetadataFieldConfig" value-type="org.dspace.importer.external.metadatamapping.contributor.MetadataContributor"> <description>Defines which metadatum is mapped on which metadatum. Note that while the key must be unique it only matters here for postprocessing of the value. The mapped MetadatumContributor has full control over what metadatafield is generated. </description> <!-- These entry tags are the enabled mappings. The "value-ref" must map to a <bean> ID. --> <entry key-ref="arxiv.title" value-ref="arxivTitleContrib"/> <entry key-ref="arxiv.summary" value-ref="arxivSummaryContrib"/> <entry key-ref="arxiv.published" value-ref="arxivPublishedContrib"/> <entry key-ref="arxiv.arxiv.doi" value-ref="arxivDoiContrib"/> <entry key-ref="arxiv.arxiv.journal_ref" value-ref="arxivJournalContrib"/> <entry key-ref="arxiv.category.term" value-ref="arxivCategoryTermContrib"/> <entry key-ref="arxiv.author.name" value-ref="arxivAuthorContrib"/> <entry key-ref="arxiv.identifier.other" value-ref="arxivOtherContrib"/> </util:map>
Each field in the file is mapped to a DSpace metadata field, usually in a "SimpleXPathMetadatumContributor" bean definition which also uses a "MetadataFieldConfig" bean. NOTE: a large number of DSpace defined metadata fields are already configured as MetadataFieldConfig beans in the "dublincore-metadata-mapper.xml" Spring Config in the same directory. These may be reused in other configurations.
<!-- This first bean define an XPath query ("ns:title") to map to a field (ID="arxiv.title") in DSpace --> <bean id="arxivTitleContrib" class="org.dspace.importer.external.metadatamapping.contributor.SimpleXpathMetadatumContributor"> <property name="field" ref="arxiv.title"/> <property name="query" value="ns:title"/> <property name="prefixToNamespaceMapping" ref="arxivBasePrefixToNamespaceMapping"/> </bean> <!-- This second bean then defines which DSpace field to use when "arxiv.title" is references. In other words, between these two beans, the "ns:title" XPath query value is saved to "dc.title". --> <bean id="arxiv.title" class="org.dspace.importer.external.metadatamapping.MetadataFieldConfig"> <constructor-arg value="dc.title"/> </bean>
When using an implementation of AbstractImportMetadataSourceService
or AbstractPlainMetadataSource
, a mapping of remote record fields to DSpace metadata fields can be created.
first create an implementation of class AbstractMetadataFieldMapping
with the same type set used for the importer implementation.
Then create a spring configuration file in [dspace.dir]/config/spring/api
.
Each DSpace metadata field that will be used for the mapping must first be configured as a spring bean of classorg.dspace.importer.external.metadatamapping.MetadataFieldConfig.
NOTE: A large number of these MetadataFieldConfig definitions are already provided out-of-the-box in [dspace.dir]/config/spring/api/dublincore-metadata-mapper.xml
This allows most service-specific Spring configurations to just reuse those existing MetadataFieldConfig definitions
Now this metadata field can be used to create a mapping. To add a mapping for the "dc.title" field declared above, a new spring bean configuration of a class class org.dspace.importer.external.metadatamapping.contributor.MetadataContributor needs to be added. This interface contains a type argument. The type needs to match the type used in the implementation of AbstractImportSourceService. The responsibility of each MetadataContributor implementation is to generate a set of metadata from the retrieved document. How it does that is completely opaque to the AbstractImportSourceService but it is assumed that only one entity (i.e. item) is fed to the metadatum contributor.
For example java SimpleXpathMetadatumContributor implements MetadataContributor<OMElement>
can parse a fragment of xml and generate one or more metadata values.
This bean expects 2 property values:
Multiple record fields can also be combined into one value. To implement a combined mapping first create a SimpleXpathMetadatumContributor as explained above for each part of the field.
Note that namespace prefixes used in the xpath queries are configured in bean "FullprefixMapping" in the same spring file.
Finally create a spring bean configuration of classorg.dspace.importer.external.metadatamapping.contributor.CombinedMetadatumContributor. This bean expects 3 values:
Each contributor must also be added to the "MetadataFieldMap" used by the MetadataFieldMapping implementation. Each entry of this map maps a metadata field bean to a contributor. For the contributors created above this results in the following configuration:
Note that the single field mappings used for the combined author mapping are not added to this list.
First read the base documentation on external importing This documentation explains the implementation of the importer framework using PubMed (http://www.ncbi.nlm.nih.gov/pubmed) as an example.
To be able to do the lookup for our configured import-service, we need to be able to know what URL to use to check for publications. This URL the publication-lookup.url
setting defined within the [dspace.dir]/config/modules/publication-lookup.cfg
. You may choose to modify this setting or override it within your local.cfg.
This setting can be modified in one of two ways:
baseAddress
for beans within the [src]/dspace-api/src/main/resources/spring/spring-dspace-addon-import-services.xml
Spring config file.publication-lookup.url=http://eutils.ncbi.nlm.nih.gov/entrez/eutils/
publication-lookup.url
is set to an asterisk ('*'). This default value will attempt to lookup the publication using ALL configured importServices in the [src]/dspace-api/src/main/resources/spring/spring-dspace-addon-import-services.xml
Spring config fileThe PubMed metadata mappings are defined in the [dspace.dir]/config/spring/api/pubmed-integration.xml
Spring configuration file. These metadata mappings can be tweaked as desired. The format of this file is described in the "Metadata mapping" section above
These classes are simply implementations based of the base classes defined in importer/external. They add characteristic behavior for services/mapping for the PubMed specific data.