Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Reverted to v. 1

DSpace Intermediate Metadata Format

This page proposes an XML notation for DSpace's internal Item metadata, that is, the metadata fields stored in the database for each Item. It is used by XsltCrosswalk. It is called the Intermediate format because it is inteded solely as an intermediate stage in XML-translation-based crosswalks. To reiterate,

This is an INTERMEDIATE format, it is NOT for exporting or harvesting metadata!

XSLT translation

Extensible Stylesheet Language (XSL) and XSLT is a powerful mechanism for transforming one XML expression into another. Since many metadata formats are expressed in XML, you can use XSL stylesheets quite effectively to implement crosswalks.

See the XsltCrosswalk for an example of how to do this.

However, you have to start with (or end up with) an XML document. That is why we need the DSpace Intermediate Metadata format. To generate XML metadata from an Item, the steps are:

  1. Generate this Intermediate XML format directly from the item.
  2. Invoke an XSLT engine and your XSL stylesheet to translate that to the target metadata format, e.g. MODS.

On submission, the steps are reversed:

  1. Invoke an XSLT engine and your XSL stylesheet to translate the incoming metadata format to DSpace Intermediate Format.
  2. Hand the intermediate XML to a method that stuffs its field values into the Item (like many calls to
    Code Block
    Item.addDC
    ).

The Format

It cannot be overemphasized that this is strictly an internal metadata format. It must never be recorded, transmitted, or exposed outside of DSpace, because it is NOT any sort of standard metadata. We must not allow it to "escape" to prevent its being mistaken for an actual supported and sanctioned metadata format. It exists to support internal transformations only.

This format is designed to be a straightforward and precise representation of the DSpace data model's Item metadata. It represents the new metadata model described in MetadataSupport, which includes a "metadata schema" field.

Namespace

See XmlNamespaces for details. All elements are in the "dim" namespace, identified by the URI

Code Block
<nowiki>http://www.dspace.org/xmlns/dspace/dim</nowiki>

There will eventually be a schema for this namespace, as soon as

Code Block
dspace.org

establishes a place to put schemas. The purpose of the schema is to document the

Code Block
dim

element and allow validation.

Elements

Here is an example of a metadata record:

Code Block

 <dim:dim xmlns:dim="<nowiki>http://www.dspace.org/xmlns/dspace/dim</nowiki>" dspaceType="ITEM">
   <dim:field mdschema="dc" element="title" lang="en_US">
     The Endochronic Properties of Resublimated Thiotimonline
   </dim:field>
   <dim:field mdschema="dc" element="contributor" qualifier="author">
     Isaac Asimov
   </dim:field>
   <dim:field mdschema="dc" element="language" qualifier="iso">
     eng
   </dim:field>
   <dim:field mdschema="dc" element="subject" qualifier="other" lang="en_US">
     time-travel scifi hoax 
   </dim:field>
   <dim:field element="publisher">
     Boston University Department of Biochemistry
   </dim:field>
 </dim:dim>

The root element is named

Code Block
"dim"

(for DSpace Intermediate Metadata, also because it is an unappealing name in English to discourage exposing it!). This element may contain one attribute (along with namespace declarations):

  • The
    Code Block
    '''dspaceType'''
    attribute is the type of dspace object being described. The possible values of this attribute are: "
    Code Block
    ITEM
    ", "
    Code Block
    COLLECTION
    ", or "
    Code Block
    COMMUNITY
    ".
  • The
    Code Block
    '''dim'''
    element contains a list of 0 or more
    Code Block
    "field"
    elements, each of which describes a single value. In each
    Code Block
    field
    element:
  • The
    Code Block
    '''mdschema'''
    attribute is the metadata schema, aka "namespace", described in MetadataSupport. In this example all fields are "dc", meaning the original DSpace LAP qualified DC. This could be the default when that attribute is omitted.
  • The
    Code Block
    '''element'''
    attribute is the Dublin Core element name, or its equivalent in another schema. It is required.
  • The
    Code Block
    '''qualifier'''
    attribute is the DC qualifier or equivalent. Omitting it means the qualifier is null.
  • Finally, the
    Code Block
    '''lang'''
    attribute is the language code associated with the entry. I deliberately did not use the XML standard
    Code Block
    xml:lang
    name for this attribute because it implies semantics that we cannot guarantee to support, since the value of this attribute is whatever someone put into DSpace.
  • The text value of the
    Code Block
    '''field'''
    element is the value of the metadata field.
  • Any number of
    Code Block
    '''field'''
    elements are allowed, even with all attributes matching.

An XML Schema document (XSD) description will be forthcoming once this design is approved.

Extensions

For ItemBatchUpdate (importing existing bibliographic data and uploading of corresponding files), DIM is "extended" the following ways:

  1. Code Block
    <dim:list>...</dim:list>
    now can enclose multiple items:
    Code Block
    <dim:dim>...</dim:dim>
  2. Code Block
    <dim:field ... type="field-type">...</dim:field>
    is used to specify either type="unique" (to remove prior field content before inserting new one) either type="key" (to specify that the current record replaces the record which may (or not) exist with the same value: useful for external identifiers)
  3. Code Block
    <dim:remove mdschema="schema-name" element="element-name" qualifier="qualifier-name" lang="language_country"/>
    remove field occurrence(s) corresponding to the specified element-name, qualifier-name and (optional) language_country.
  4. Code Block
    <dim:original>file-path</dim:original>
    specifies the path of the document file to upload (not a "symbolic link")
  5. Code Block
    <dim:licence>file-path</dim:licence>
    specifies the path to the licence file to upload
  6. Code Block
    <dim:collection>
    collection complete handle or internal number
    Code Block
    </dim:collection>
    : Additional collection to link with the document

Modifications are in XSLTIngestionCrosswalk and are therefore common to ItemImport and XSLTingest (described in ItemBatchUpdate).

Opinions? Comments?

Should DIM support the types of BITSTREAM and/or BUNDLE? If so what fields would be used? – ScottPhillips

...