Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The framework produces an 'ImportRecord' that is completely decoupled from DSpace. It contains a set of metadata DTO's that contain the notion of schema,element and qualifier. The specific implementation is responsible for populating this set. It is then very simple to create a DSpace item from this list.

Implementation of an import source

Each importer implementation must at least implement interface should extend one of the following:

  • If you are creating a File-based importer, it should extend the org.dspace.importer.external.service.components.AbstractPlainMetadataSource abstract class.

...

  •   This is an abstract implementation of MetadataSource which is useful when the source of the metadata is an uploaded file (e.g. Bibtex, CSV or similar)
  • If you are creating a service-based importer, it should extend the org.dspace.importer.external.service.

...

  • AbstractImportMetadataSourceService abstract class. This is an abstract implementation of MetadataSource  which is useful when the source of the metadata is a query against an external API (e.g. PubMed, arXiv or similar)

Implementing one of these abstract classes

A third option is to implement class org.dspace.importer.external.service.AbstractImportSourceService. This class already implements both the MetadataSource interface and Source class. AbstractImportSourceService has a generic type set 'RecordType'. In the importer implementation this type set should be the class of the records received from the remote source's response (e.g. when using axiom to get the records from the remote source's XML response, the importer implementation's type set isorg.apache.axiom.om.OMElement).

Implementing the AbstractImportSourceService allows the importer implementation to use the framework's build-in support to transform a record received from the remote source to an object of class *org.dspace.importer.external.datamodel.ImportRecord containing containing DSpace metadata fields, as explained here: below in "Metadata mapping".

Inherited methods

Method getImportSource() should return a unique identifier. Importer implementations should not be called directly, but class class org.dspace.importer.external.service.ImportService should should be called instead. This class contains the same methods as the importer implementations, but with an extra parameter 'url'. This url parameter should contain the same identifier that is returned by the getImportSource() method of the importer implementation you want to use.

The other inherited methods are used to query the remote source.the remote source.

Editing Metadata Mapping

At a simple level, metadata mapping configurations are all in Spring configs in [dspace.dir]/config/spring/api/

In that directory, you'll find a mapping file per import source, e.g. "arxiv-integration.xml", "bibtex-integration.xml", "endnote-integration.xml", "pubmed-integration.xml", etc.

There are two different mapping types.

  1. First, mapping from a file-based import (e.g. bibtex, endnote, ris, etc) to a DSpace metadata field.
    1. The list of all of the enabled mappings can be found in a "MetadataFieldConfig" <util:map>, usually at the top of the config file.

      Code Block
      <util:map id="bibtexMetadataFieldMap" key-type="org.dspace.importer.external.metadatamapping.MetadataFieldConfig"
                    value-type="org.dspace.importer.external.metadatamapping.contributor.MetadataContributor">
          <description>Defines which metadatum is mapped on which metadatum. Note that while the key must be unique it
              only matters here for postprocessing of the value. The mapped MetadatumContributor has full control over
                  what metadatafield is generated.
          </description>
          <!-- These entry tags are the enabled mappings. The "value-ref" must map to a <bean> ID. -->
          <entry key-ref="dcTitle" value-ref="bibtexTitleContrib" />
          <entry key-ref="dcAuthors" value-ref="bibtexAuthorsContrib" />
          <entry key-ref="dcJournal" value-ref="bibtexJournalContrib" />
          <entry key-ref="dcIssued" value-ref="bibtexIssuedContrib" />
          <entry key-ref="dcJissn" value-ref="bibtexJissnContrib" />     
      </util:map>


    2. Each field in the file is mapped to a DSpace metadata field in a "SimpleMetadataContributor" bean definition.  NOTE: a large number of DSpace defined metadata fields are already configured as MetadataFieldConfig beans in the "dublincore-metadata-mapper.xml" Spring Config in the same directory. These may be reused in other configurations.

      Code Block
      <!-- This example bean for BibTex says the "title" key in the BibTex" file should be mapped to the DSpace metadata field 
           defined in the "dcTitle" bean.  This "dcTitle" bean is found in "dublincore-metadata-mapper.xml" and obviously maps to "dc.title" -->
      <bean id="bibtexTitleContrib" class="org.dspace.importer.external.metadatamapping.contributor.SimpleMetadataContributor">
          <property name="field" ref="dcTitle"/>
          <property name="key" value="title" />
      </bean>


  2. Second, mapping from an external API query import (e.g. arxiv, pubmed, etc) to a DSpace metadata field.
    1. Similar to above, The list of all of the enabled mappings can be found in a "MetadataFieldConfig" <util:map>, usually at the top of the config file.

      Code Block
      <util:map id="arxivMetadataFieldMap" key-type="org.dspace.importer.external.metadatamapping.MetadataFieldConfig"
                    value-type="org.dspace.importer.external.metadatamapping.contributor.MetadataContributor">
          <description>Defines which metadatum is mapped on which metadatum. Note that while the key must be unique it
              only matters here for postprocessing of the value. The mapped MetadatumContributor has full control over
              what metadatafield is generated.
          </description>
          <!-- These entry tags are the enabled mappings. The "value-ref" must map to a <bean> ID. -->
          <entry key-ref="arxiv.title" value-ref="arxivTitleContrib"/>
          <entry key-ref="arxiv.summary" value-ref="arxivSummaryContrib"/>
          <entry key-ref="arxiv.published" value-ref="arxivPublishedContrib"/>
          <entry key-ref="arxiv.arxiv.doi" value-ref="arxivDoiContrib"/>
          <entry key-ref="arxiv.arxiv.journal_ref" value-ref="arxivJournalContrib"/>
          <entry key-ref="arxiv.category.term" value-ref="arxivCategoryTermContrib"/>
          <entry key-ref="arxiv.author.name" value-ref="arxivAuthorContrib"/>
          <entry key-ref="arxiv.identifier.other" value-ref="arxivOtherContrib"/>
      </util:map>


    2. Each field in the file is mapped to a DSpace metadata field, usually in a "SimpleXPathMetadatumContributor" bean definition which also uses a "MetadataFieldConfig" bean.  NOTE: a large number of DSpace defined metadata fields are already configured as MetadataFieldConfig beans in the "dublincore-metadata-mapper.xml" Spring Config in the same directory. These may be reused in other configurations.

      Code Block
      <!-- This first bean define an XPath query ("ns:title") to map to a field (ID="arxiv.title") in DSpace -->
      <bean id="arxivTitleContrib" class="org.dspace.importer.external.metadatamapping.contributor.SimpleXpathMetadatumContributor">
          <property name="field" ref="arxiv.title"/>
          <property name="query" value="ns:title"/>
          <property name="prefixToNamespaceMapping" ref="arxivBasePrefixToNamespaceMapping"/>
      </bean>
      <!-- This second bean then defines which DSpace field to use when "arxiv.title" is references. In other words, between these two beans,
           the "ns:title" XPath query value is saved to "dc.title". -->
      <bean id="arxiv.title" class="org.dspace.importer.external.metadatamapping.MetadataFieldConfig">
          <constructor-arg value="dc.title"/>
      </bean>


Creating new Metadata mapping

When using an implementation of AbstractImportSourceService AbstractImportMetadataSourceService or AbstractPlainMetadataSource, a mapping of remote record fields to DSpace metadata fields can be created.

first create an implementation of class AbstractMetadataFieldMapping with the same type set used for the importer implementation.

...

Each DSpace metadata field that will be used for the mapping must first be configured as a spring bean of classorg.dspace.importer.external.metadatamapping.MetadataFieldConfig.


Info
iconfalse
    <bean id="dc.titledcTitle" class="org.dspace.importer.external.metadatamapping.MetadataFieldConfig">
        <constructor-arg value="dc.title"/>
    <constructor-arg value="dc.title"/>
    </bean></bean>

NOTE: A large number of these MetadataFieldConfig definitions are already provided out-of-the-box in  [dspace.dir]/config/spring/api/dublincore-metadata-mapper.xml  This allows most service-specific Spring configurations to just reuse those existing MetadataFieldConfig definitions

Now this metadata field can be used to create a mapping. To add a mapping for the "dc.title" field declared above, a new spring bean configuration of a class class org.dspace.importer.external.metadatamapping.contributor.MetadataContributor needs to be added. This interface contains a type argument. The type needs to match the type used in the implementation of AbstractImportSourceService. The responsibility of each MetadataContributor implementation is to generate a set of metadata from the retrieved document. How it does that is completely opaque to the AbstractImportSourceService but it is assumed that only one entity (i.e. item) is fed to the metadatum contributor.

...

  • field: A reference to the configured spring bean of the DSpace metadata field. e.g. the "dc.titledcTitle" bean declared above.
  • query: The xpath expression used to select the record value returned by the remote source.

...