Contents |
This page proposes an XML notation for DSpace's internal Item metadata, that is, the metadata fields stored in the database for each Item. It is used by XsltCrosswalk. It is called the Intermediate format because it is inteded solely as an intermediate stage in XML-translation-based crosswalks. To reiterate, This is an INTERMEDIATE format, it is NOT for exporting or harvesting metadata!
Extensible Stylesheet Language (XSL) and XSLT is a powerful mechanism for transforming one XML expression into another. Since many metadata formats are expressed in XML, you can use XSL stylesheets quite effectively to implement crosswalks.
See the XsltCrosswalk for an example of how to do this.
However, you have to start with (or end up with) an XML document. That is why we need the DSpace Intermediate Metadata format. To generate XML metadata from an Item, the steps are:
On submission, the steps are reversed:
Item.addDC()
).It cannot be overemphasized that this is strictly an internal metadata format. It must never be recorded, transmitted, or exposed outside of DSpace, because it is NOT any sort of standard metadata. We must not allow it to "escape" to prevent its being mistaken for an actual supported and sanctioned metadata format. It exists to support internal transformations only.
This format is designed to be a straightforward and precise representation of the DSpace data model's Item metadata. It represents the new metadata model described in MetadataSupport, which includes a "metadata schema" field.
See XmlNamespaces for details. All elements are in the "dim" namespace, identified by the URI http://www.dspace.org/xmlns/dspace/dim
There will eventually be a schema for this namespace, as soon as dspace.org
establishes a place to put schemas. The purpose of the schema is to document the {dim}} element and allow validation.
Here is an example of a metadata record:
<dim:dim xmlns:dim="http://www.dspace.org/xmlns/dspace/dim" dspaceType="ITEM"> <dim:field mdschema="dc" element="title" lang="en_US"> The Endochronic Properties of Resublimated Thiotimonline </dim:field> <dim:field mdschema="dc" element="contributor" qualifier="author"> Isaac Asimov </dim:field> <dim:field mdschema="dc" element="language" qualifier="iso"> eng </dim:field> <dim:field mdschema="dc" element="subject" qualifier="other" lang="en_US"> time-travel scifi hoax </dim:field> <dim:field element="publisher"> Boston University Department of Biochemistry </dim:field> </dim:dim> |
The root element is named "dim"
(for DSpace Intermediate Metadata, also because it is an unappealing name in English to discourage exposing it!). This element may contain one attribute (along with namespace declarations):
dspaceType
attribute is the type of dspace object being described. The possible values of this attribute are: "ITEM
", "COLLECTION
", or "COMMUNITY
".dim
element contains a list of 0 or more "field"
elements, each of which describes a single value. In each field
element:mdschema
attribute is the metadata schema, aka "namespace", described in MetadataSupport. In this example all fields are "dc", meaning the original DSpace LAP qualified DC. This could be the default when that attribute is omitted.element
attribute is the Dublin Core element name, or its equivalent in another schema. It is required.qualifier
attribute is the DC qualifier or equivalent. Omitting it means the qualifier is null.lang
attribute is the language code associated with the entry. I deliberately did not use the XML standard xml:lang
name for this attribute because it implies semantics that we cannot guarantee to support, since the value of this attribute is whatever someone put into DSpace.field
element is the value of the metadata field.field
elements are allowed, even with all attributes matching.An XML Schema document (XSD) description will be forthcoming once this design is approved.
For ItemBatchUpdate (importing existing bibliographic data and uploading of corresponding files), DIM is "extended" the following ways:
<dim:list>...</dim:list>
now can enclose multiple items: <dim:dim>...</dim:dim>
<dim:field ... type="field-type">...</dim:field>
is used to specify either type="unique" (to remove prior field content before inserting new one) either type="key" (to specify that the current record replaces the record which may (or not) exist with the same value: useful for external identifiers)<dim:remove mdschema="schema-name" element="element-name" qualifier="qualifier-name" lang="language_country"/>
remove field occurrence(s) corresponding to the specified element-name, qualifier-name and (optional) language_country.<dim:original>file-path</dim:original>
specifies the path of the document file to upload (not a "symbolic link")<dim:licence>file-path</dim:licence>
specifies the path to the licence file to upload<dim:collection> collection complete handle or internal number </dim:collection>
: Additional collection to link with the documentModifications are in XSLTIngestionCrosswalk
and are therefore common to ItemImport
and XSLTingest
(described in ItemBatchUpdate).
Should DIM support the types of BITSTREAM and/or BUNDLE? If so what fields would be used? – ScottPhillips