Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Expectations

This proposal is based on the premis that changes to DSpace metadata characteristics must be backward comparable and retain the same functionality as previously existed to ease transitional for all existing users of the platform.  So many different functional areas of DSpace are reliant on existing metadata functionality, that it is criticial that any changes in functionality also have well defined and scripted updates across releases.

The following are some basic  features of the proposal:

  • Metadata fields can include additional properties for
    • Validation rules such as syntax or vocabulary encodings
    • Flag to designate the field is required.
    • Form field types for input forms
    • Type fields to designate Dublin Core or other metadata schema types, types are initially hard coded but new schema registry is extensible
    • MetadataField should be extended with methods to derive its "dc" type. In the absence of an assigned type, all fields default to dc.description typing.
  • MetadataSchema filed will be repurposed and extended to support
    • Identification of the types of DSO it may be assigned to
    • DSpaceObjects will be extended to allow them to have a specific "MetadataSchema" assigned. For example, different schema can be created for Publications, thesis, Multimedia, and so on, each having a different set of fields.
    • DSpace will be able to use the new MetadataSchema registry will replace majority of input-forms.xml file.
    • Additional table(s) will more than likely be required to designate schema that can be used in a specific collection, and thus the input forms that may be enabled in that collection.
    • Inheritance may be used in schema to reduce replication of fields. For example, a base schema with Required DSpace fields that are generate during submission, workflow, archive processes (title, issued, accessioned, available)
    • New schema may inherit from it to reduce replication of metadata fields.

Primary Objective

The primary objective of this proposal is that the DSpace metadata registry be "naturally" extended to support a richer and more expressive "Metadata Schema". Technical Objectives of the rpoposal are to provide the following features:

  1. Capability to Define "Metadata Profiles" for specific DSpace Objects and/or types of Objects.
  2. Capability to Define DMCI "subPropertyOf" relationships outside of the legacy ns.element.qualifier approach
  3. Capability to have "immutable" DC, DCTERMS and other "well established" namespaces  
  4. Capability to Validate Existing DSpace item Metadata based on a profile that is either assigned via the parent container or directly tot he DSpace Item
  5. Capability to Apply these profiles similarly to DSpace Communities, Collection, Items, Bundles and Bitstreams.

Another very critical feature of this proposal is that this new Schema model should support the above features without significant need to transform existing DSpace Item metadata nor the registry itself.  

Conceptual Definition of "Schema"

The DSpace MetadataSchema registry was designed based on an outdated concept of "Application Profiles" and "Qualified Dublin Core" that predated the current DCMI Abstract Model.  Due to this, there are number of significant shortcomings to the current implementation.

  1. Namespaces are not "Schema"
  2. Qualification does not effectively meet needs for use of alternative namespaces while still providing clear mappings to DC for exposing metadata in OAI_DC.
  3. The Schema and Fields defined are insufficient to support validation of DSpace metadata fields in relation to Item Submission or other methods of Deposit.

The current "DSpace Schema" does not meet the requirements that a Schema is traditionally used for.  Schema are traditionally used to define a scaffolding or framework of rules which actual content can be validated against. While the current MetadataSchema/Field does restrict what can be assigned to any item in DSpace, it does not provide any support for validation of these assignments, nor allow us to further define the encoding of the metadata values nor if they are required or not.  At this time, much if of the validation, rules and encoding is poorly assigned instead, at the UI/Presentation level in the DSpace Submission input-forms.xml file and only enforced in the Describe Step of the Submission workflow.

This proposal seeks to extend the definition of the DSpace Metadata Schema to include support of these features previously found only in the Submission input-forms.xml. Formaizing a strategy for metadata validation in DSpace that is a new core feature.

...

Repurposing of MetadataSchema and MetadataField as Custom Metadata Template

Rather than MetadataSchema applying to the namespace of the metadata fields that are allowed by the repository.  We instead recommend that this table be repurposed to embody "templates" of MetadataFields that should be used for specific types of DSpace Objects.   Typing would be based on:

These above types will be expressed through the addition of properties to the MetadataSchemaRegistry and MetadataFieldRegistry tables to provide the facility to expand on and add additional Schema.  Some Hypothetical examples of such schema would be:

  • Community or Collection Profiles
    • Document Collection Profile
    • Journal Issue Profile
    • Image Gallery Profile
  • Item Profiles : 
    • Scholarly Item Profile
    • Website Item Profile
    • Thesis Item Profile
    • Technical Report Item Profile
    • Journal Article Profile
    • Learning Object Item Profile
  • Bitstream Profiles
    • Multimedia Profile
      • Streaming Video Profile
      • Image Profile
    • Document Profile
      • Article
      • Spreadsheet
      • Etc
  • Custom
    • Custom Profile for any new type of DSpace content

The above profiles could be applied heterogeniously though metadata attached to any level of the DSpace object hierarchy.

Metadata Field Inheritance

Individual Metadata Fields, like DCMI metadata properties will support subTyping or inheritance. For example, from the DCMI Website, we have the following:

http://dublincore.org/documents/dcmi-terms/#terms-title

Term Name:    title
URI:http://purl.org/dc/terms/title
Label:Title
Definition:A name given to the resource.
Type of Term:Property
Refines:http://purl.org/dc/elements/1.1/title
Version:http://dublincore.org/usage/terms/history/#titleT-002
Has Range:http://www.w3.org/2000/01/rdf-schema#Literal

In the case of DSpace

Supporting a similar level of refinement for DSpace Metadata can be supported through the addition of new  MetadataFieldRegistry properties that are capable of storing this relationship.


The following are some basic  features of the proposal:

  • Metadata fields can include additional properties for
    • Validation rules such as syntax or vocabulary encodings
    • Flag to designate the field is required.
    • Form field types for input forms
    • Type fields to designate Dublin Core or other metadata schema types, types are initially hard coded but new schema registry is extensible
    • MetadataField should be extended with methods to derive its "dc" type. In the absence of an assigned type, all fields default to dc.description typing.
  • MetadataSchema filed will be repurposed and extended to support
    • Identification of the types of DSO it may be assigned to
    • DSpaceObjects will be extended to allow them to have a specific "MetadataSchema" assigned. For example, different schema can be created for Publications, thesis, Multimedia, and so on, each having a different set of fields.
    • DSpace will be able to use the new MetadataSchema registry will replace majority of input-forms.xml file.
    • Additional table(s) will more than likely be required to designate schema that can be used in a specific collection, and thus the input forms that may be enabled in that collection.
    • Inheritance may be used in schema to reduce replication of fields. For example, a base schema with Required DSpace fields that are generate during submission, workflow, archive processes (title, issued, accessioned, available)
    • New schema may inherit from it to reduce replication of metadata fields.

Gliffy Diagram
nameNew Metadata Class Diagram

In the case of DSpace

 

IDFieldrefinesencodingdefaultrequiredScope Note
15dc.date.issueddc:dateW3CDTF${now}trueDate of publication or distribution.
10dc.datedc:dateW3CDTF${now} Use qualified form if possible.
25dc.identifier.uridc:identifierURI trueUniform Resource Identifier
17dc.identifierdc:identifierLiteral  

 

 ..hasversion4754Library of Congress Subject HeadingsUncontrolled index term.dc.title.alternative
IDFieldScope Note 
    
 73dc.accessibility.imageequivalentsBoolean field, true if images have equivalents 
 73dc.accessibility.imageequivalentsBoolean field, true if images have equivalents
 74dc.accessibility.imageequivalentspresentationIndicates the way image equivalents are presented 
 72dc.accessibility.imagespresentBoolean accessibility field 
 2dc.contributor.advisorUse primarily for thesis advisor. 
 3dc.contributor.author  
 4dc.contributor.editor  
 5dc.contributor.illustrator  
 6dc.contributor.other  
 82dc.contributor.sponsor  
 1dc.contributorA person, organization, or service responsible for the content of the resource. Catch-all for unspecified contributors. 
 7dc.coverage.spatialSpatial characteristics of content. 
 8dc.coverage.temporalTemporal characteristics of content. 
 9dc.creatorDo not use; only for harvested metadata. 
 11dc.date.accessionedDate DSpace takes possession of item. 
 12dc.date.availableDate or date range item became available to the public. 
 13dc.date.copyrightDate of copyright. 
 14dc.date.createdDate of creation or manufacture of intellectual content if different from date.issued. 
 77dc.date.embargountilDate Embargo will be lifted. 
 15dc.date.issuedDate of publication or distribution. 
 16dc.date.submittedRecommend for theses/dissertations. 
 67dc.date.updatedThe last time the item was updated via the SWORD interface 
 10dc.dateUse qualified form if possible. 
 27dc.description.abstractAbstract or summary. 
 76dc.description.embargotermsDescription of Embargo Terms 
 28dc.description.provenanceThe history of custody of the item since its creation, including any changes successive custodians made to it. 
 29dc.description.sponsorshipInformation about sponsoring agencies, individuals, or contractual arrangements for the item. 
 30dc.description.statementofresponsibilityTo preserve statement of responsibility from MARC records. 
 31dc.description.tableofcontentsA table of contents for a given item. 
 32dc.description.uriUniform Resource Identifier pointing to description of this item. 
 68dc.description.versionThe Peer Reviewed status of an item 
 26dc.descriptionCatch-all for any description not defined by qualifiers. 
 34dc.format.extentSize or duration. 
 35dc.format.mediumPhysical medium. 
 36dc.format.mimetypeRegistered MIME type identifiers. 
 33dc.formatCatch-all for any format information not defined by qualifiers. 
 18dc.identifier.citationHuman-readable, standard bibliographic citation of non-DSpace format of this item 
 19dc.identifier.govdocA government document number 
 20dc.identifier.isbnInternational Standard Book Number 
 23dc.identifier.ismnInternational Standard Music Number 
 21dc.identifier.issnInternational Standard Serial Number 
 24dc.identifier.otherA known identifier type common to a local collection. 
 22dc.identifier.siciSerial Item and Contribution Identifier 
 69dc.identifier.sluga uri supplied via the sword slug header, as a suggested uri for the item 
 25dc.identifier.uriUniform Resource Identifier 
 17dc.identifierCatch-all for unambiguous identifiers not defined by qualified form; use identifier.other for a known identifier common to a local collection instead of unqualified form. 
 38dc.language.isodc:languageRFC5646en Current ISO standard for language of intellectual content, including country codes (e.g. "en_US").
 37 70dc.language.rfc3066the rfc3066 form of the language for the itemdc:languageRFC5646en 37dc.languageCatch-all for non-ISO forms of the language of the item, accommodating harvested values.  39dc.publisherEntity responsible for publication, distribution, or imprint.  
44dc.relation.haspartReferences physically or logically contained item.  46dc:relationReferences later version.URI  dc.relation.isbasedonReferences source.  41dc.relation.isformatofReferences additional physical form.  42dc.relation.ispartofReferences physically or logically containing contained item.
 40 43dc.relation.ispartofseriesSeries name and number within that series, if available. 
 48dc.relation.isreferencedbyPointed to by referenced resource. 
 51dc.relation.isreplacedbyReferences succeeding item. 
 45dc.relation.isversionofReferences earlier version. 
 50dc.relation.replacesReferences preceeding item. 
 49dc.relation.requiresReferenced resource is required to support function, delivery, or coherence of item. 
 52dc.relation.uriReferences Uniform Resource Identifier for related item. 
dc:relationURI    40dc.relationCatch-all for references to other related items. 
 6271dc.rightssubject.holderThe owner of the copyrightmeshdc:subjectURI  dc.rights.uriReferences terms governing use and reproduction.  MEdical Subject Headings
5363dc.rightsTerms governing use and reproduction. 
 56dc.source.uriDo not use; only for harvested metadata. 
 55dc.sourceDo not use; only for harvested metadata. 
subject.otherdc:subjectLiteral   Local controlled vocabulary; global vocabularies  58dc.subject.classificationCatch-all for value from local classification system; global classification systems will receive specific qualifier. 
 5957dc.subject.ddcDewey Decimal Classification Number 
 60dc.subject.lccLibrary of Congress Classification Number 
dc:subjectLiteral  61dc.subject.lcsh  62dc.subject.meshMEdical Subject Headings  Uncontrolled index term.
6563dc.subject.otherLocal controlled vocabulary; global vocabularies will receive specific qualifier.  57dc.subjecttitle.alternativedc:titleTEXT  65Varying (or substitute) form of title proper appearing in item, e.g. abbreviation or translation 
 64dc.titledc:titleTEXT trueTitle statement/title proper.  
66dc.typedc:typeClass  Nature or genre of content.