Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Primary Objective

The primary objective of this proposal is that the DSpace metadata registry be "naturally" extended to support a richer and more expressive "Metadata Schema". Technical Objectives of the proposal are to provide the following features:

  1. Capability to Define Metadata "Profiles" that may be assigned to individual DSpace Objects
  2. Capability to Define "subPropertyOf" relationships in place of the legacy ns.element.qualifier approach for more expressive inheritance and mapping to OAI_DC
  3. Capability to have "immutable" DC, DCTERMS and other "well established" namespaces to treat as sources for "subPropertyOf" assignments.
  4. Capability to to Restrict and Validate Existing DSpace Object Metadata based on the assigned "Profile". 
  5. Capability to Apply these profiles similarly to any DSO; Communities, Collections, Items, Bundles and Bitstreams, even Groups and EPeople.

Expectations for Backward Compatibility

This proposal is based on the premis that changes to DSpace metadata characteristics must be backward comparable and retain the same functionality as previously existed to ease transitional transition for all existing users of the platform.  So many different functional areas of DSpace are reliant on existing metadata functionality , that it is criticial that any changes in functionality also have well defined and scripted updates across releases.  Thus another very critical feature of this proposal is that this new Schema model should support the above features without significant need to transform existing DSpace Item metadata nor the registry itself.  

The following are some basic  features of the proposal:

  • Metadata fields can include additional properties for
    • Validation rules such as syntax or vocabulary encodings
    • Flag to designate the field is required.
    • Form field types for input forms
    • Type fields to designate Dublin Core or other metadata schema types, types are initially hard coded but new schema registry is extensible
    • MetadataField should be extended with methods to derive its "dc" type. In the absence of an assigned type, all fields default to dc.description typing.
  • MetadataSchema filed will be repurposed and extended to support
    • Identification of the types of DSO it may be assigned to
    • DSpaceObjects will be extended to allow them to have a specific "MetadataSchema" assigned. For example, different schema can be created for Publications, thesis, Multimedia, and so on, each having a different set of fields.
    • DSpace will be able to use the new MetadataSchema registry will replace majority of input-forms.xml file.
    • Additional table(s) will more than likely be required to designate schema that can be used in a specific collection, and thus the input forms that may be enabled in that collection.
    • Inheritance may be used in schema to reduce replication of fields. For example, a base schema with Required DSpace fields that are generate during submission, workflow, archive processes (title, issued, accessioned, available)
    • New schema may inherit from it to reduce replication of metadata fields.

Gliffy Diagram
nameNew Metadata Class Diagram

Repurposing of MetadataSchema and MetadataField as Custom Metadata Template

...

Conceptual Definition of "Schema"

The DSpace MetadataSchema registry was designed based on an outdated concept of "Application Profiles" and "Qualified Dublin Core" that predated the current DCMI Abstract Model.  Due to this, there are number of significant shortcomings to the current implementation.

  1. Namespaces are not really "Schema"
  2. Schema may be validated, however, there are no actual rules in DSpace "MetadataSchema" or "MetadataField" data models.
  3. Qualification does not effectively meet needs for use of alternative namespaces nor support any ability for programatic mapping to DC for exposing metadata in other namespaces within in OAI_DC.
  4. The Schema and Fields defined are insufficient to support attributes and rules for validation of DSpace metadata fields in relation to Item Submission or other methods of Deposit.

The current "DSpace Schema" does not meet the requirements that a Schema is traditionally used for.  Schema are traditionally used to define a scaffolding or framework of rules which actual content can be validated against. While the current MetadataSchema/Field does restrict what can be assigned to any item in DSpace, it does not provide any support for validation of these assignments, nor allow us to further define the encoding of the metadata values nor if they are required or not.  At this time, much if of the validation, rules and encoding is poorly assigned instead, at the UI/Presentation level in the DSpace Submission input-forms.xml file and only enforced in the Describe Step of the Submission workflow.

This proposal seeks to extend the definition of the DSpace Metadata Schema to include support of these features previously found only in the Submission input-forms.xml. Formaizing a strategy for metadata validation in DSpace that is a new core feature.

Content Models For DSpace (Extending on MetadataSchema and MetadataField to provide "Metadata Profiles")

Rather than the current MetadataSchema applying to the namespace of the metadata fields that are allowed by the entire repository.

...

 It is instead

...

recommended that this table be repurposed and expanded to

...

support creation of "Named Profiles" that can be easily assigned to DSpaceObjects as Metadata or or Content Models.  In this case, typing would initially be based on:

These above types will be expressed through the addition of properties to the MetadataSchemaRegistry and MetadataFieldRegistry tables to provide the facility to expand on and add additional Schema.  Some Hypothetical examples of such schema would be:

Community or Collection ProfilesDescription
Document Collection ProfileMetadata Fields Appropriate to Describe Details of a Generic Document Collection for the purposes of a Finding Aid
Journal Issue ProfileMetadata Fields Appropriate to Describe Details of a Journal or Journal Issue for the purposes of a Finding Aid
Image Gallery ProfileMetadata Fields Appropriate to Describe Details of an Image or Multimedia Collection, allowing UI Hooks that are beneficial for Multimedia (Slide Decks, Light tables, Viewers, etc)
Etc,...
Item ProfilesDescription
Scholarly ItemMetadata Fields Appropriate to Describe a Scholarly Research Article
Website ItemMetadata Fields Appropriate to Describe a number of individual Bitstream files that constitute a website.
Thesis ItemMetadata Fields Appropriate to Describe a Dissertation of Thesis Item based on conventional ETDMS terminology
Technical Report Item 
Journal Article 
Learning Object Item 
Etc,...
Bitstream ProfilesDescription
Streaming Video ProfileMetadata Fields Appropriate to Describe a moving picture or video
Image ProfileMetadata Fields Appropriate to Describe an individual image
DocumentMetadata Fields Appropriate to Describe an individual document
Spreadsheet Metadata Fields Appropriate to Describe a Data file
Etc,...

Likewise, the above profiles could be applied heterogeniously though metadata attached to any level of the DSpace object hierarchy.

Metadata Field Inheritance

Individual Metadata Fields, like DCMI metadata properties will support subTyping or inheritance. For example, from the DCMI Website, we have the following:

http://dublincore.org/documents/dcmi-terms/#terms-title

Term Name:    title
URI:http://purl.org/dc/terms/title
Label:Title
Definition:A name given to the resource.
Type of Term:Property
Refines:http://purl.org/dc/elements/1.1/title
Version:http://dublincore.org/usage/terms/history/#titleT-002
Has Range:http://www.w3.org/2000/01/rdf-schema#Literal

In the case of DSpace

Supporting a similar level of refinement for DSpace Metadata can be supported through the addition of new  MetadataFieldRegistry properties that are capable of storing this relationship.

Data Model Changes to Support This Proposal

To support this prpoposal, only additional fields and relational tables will be required to be added to the existing DSpace schema.

MetadataProfile:

Profile will be used to identify a set of MetadataFieldProfile that define the fields allowed on a DSpace Object.

MetadataFieldProfile

Individul field profile used to identify the basic rules allowed for a field assigned to a DSpace Object.

Profile2dso:  

This table will be utilized to directly map any specified schema as a validation target for any existing DSpace Item. One, or more than one Schema assignment will be allow, creating a situation where an Item may be polymorphic and support more than one type.

Profile2container:

This table will support the identification of which profile should be applied to new Items being created in any Collection within DSpace. This will be extended when support for metadata at all levels of DSpace is introduced, allowing assignment of Collection and Community "Types" to Community containers and likewise, support for Specific Bitstream types to be allowed in Item Containers.

 

A tentative list of new fields and tables is exemplified in the class diagram below.

Gliffy Diagram
nameNew Metadata Class Diagram

The above solution can be easily encoded into the database schema, while the existing MetadataSchema, MetadataField and MetadataValue objects should be easy extendable to support new methods and business logic. 

A Example Use-Case

Metadata Schema Registry

In the following example an additional "dcterm" schema has been created to house the proper dcterms predicates while the "dc" schema continues to hold the existing qualified dc for legacy purposes.

Metadata Schema: "dcterms"

where "dcterms:xxx" refinements point to a new Schema in the repository that contains the fields required for the typical dcterms namespace.  In the current case, with the "item" and "item2" schema, this schema is not applied directly to Items, but inherited into defined "item" fields through "refinement".

IDFieldrefinesencodingdefaultrequiredScope Note
15dcterms.daterdf:PropertyW3CDTF${now}trueDate of publication or distribution.
25dcterms.identifierrdf:Property URI trueUniform Resource Identifier
37dcterms.languagerdf:Property RFC5646en Catch-all for non-ISO forms of the language of the item, accommodating harvested values.
40dcterms.relationrdf:Property URI   Catch-all for references to other related items.
57dcterms.subjectrdf:Property Literal   Uncontrolled index term.
64dcterms.titlerdf:Property Literal  trueTitle statement/title proper.
66dcterms.typerdf:Property Class  Nature or genre of content.
.....................

Metadata Profile Registry

The profile registry defines fields that may be attached to a DSpace Item.

  • A new "Profile" has been defined with its own namespace to be allowed on Collections A and B.
  • Each custom "Profile" can be applied to a specific DSO type (in this case, Item) via an "Applies To" mapping to objects that are of its type (in the diagram above, this is the profile2dso mapping).
  • Each custom "Profile" can enabled in a specific Container (Community, Collection, Item) via an "Allowed In" mapping  (in the diagram above, this is the profile2container mapping).

 IDNamespaceNameApplies ToAllowed In
 1

http://mydspace/schema/item

Generic Item

ItemAll Collections
 2http://mydspace/schema/item2Simple ItemItemCollection A, Collection B

Item Metadata Profile "Generic Item"

The following exemplifies how a Profile for generic items that may have many optional fields attached to them.

 

IDFieldScope Note      73dc.accessibility.imageequivalentsBoolean field, true if images have equivalents  73dc.accessibility.imageequivalentsBoolean field, true if images have equivalents 74dc.accessibility.imageequivalentspresentationIndicates the way image equivalents are presented  72dc.accessibility.imagespresentBoolean accessibility field  2dc.contributor.advisorUse primarily for thesis advisor.  3dc.contributor.author   4dc.contributor.editor   5dc.contributor.illustrator   6dc.contributor.other   82dc.contributor.sponsor   1dc.contributorA person, organization, or service responsible for the content of the resource. Catch-all for unspecified contributors.  7dc.coverage.spatialSpatial characteristics of content.  8dc.coverage.temporalTemporal characteristics of content.  9dc.creatorDo not use; only for harvested metadata.  11dc.date.accessionedDate DSpace takes possession of item.  12dc.date.availableDate or date range item became available to the public.  13dc.date.copyrightDate of copyright.  14dc.date.createdDate of creation or manufacture of intellectual content if different from date.issued.  77dc.date.embargountilDate Embargo will be lifted.  15dc.date.issuedDate of publication or distribution.  16dc.date.submittedRecommend for theses/dissertations.  67dc.date.updatedThe last time the item was updated via the SWORD interface  10dc.dateUse qualified form if possible.  27dc.description.abstractAbstract or summary.  76dc.description.embargotermsDescription of Embargo Terms  28dc.description.provenanceThe history of custody of the item since its creation, including any changes successive custodians made to it.  29dc.description.sponsorshipInformation about sponsoring agencies, individuals, or contractual arrangements for the item.  30dc.description.statementofresponsibilityTo preserve statement of responsibility from MARC records.  31dc.description.tableofcontentsA table of contents for a given item.  32dc.description.uriUniform Resource Identifier pointing to description of this item.  68dc.description.versionThe Peer Reviewed status of an item  26dc.descriptionCatch-all for any description not defined by qualifiers.  34dc.format.extentSize or duration.  35dc.format.mediumPhysical medium.  36dc.format.mimetypeRegistered MIME type identifiers.  33dc.formatCatch-all for any format information not defined by qualifiers.  18dc.identifier.citationHuman-readable, standard bibliographic citation of non-DSpace format of this item  19dc.identifier.govdocA government document number  20dc.identifier.isbnInternational Standard Book Number  23dc.identifier.ismnInternational Standard Music Number  21dc.identifier.issnInternational Standard Serial Number  24dc.identifier.otherA known identifier type common to a local collection.  22dc.identifier.siciSerial Item and Contribution Identifier  69dc.identifier.sluga uri supplied via the sword slug header, as a suggested uri for the item  25dc.identifier.uriUniform Resource Identifier  17dc.identifier
elementrefinesencodingdefaultrequiredScope Note
issueddcterms:issuedW3CDTF${now}trueDate of publication or distribution.
datedcterms:dateW3CDTF${now} Use qualified form if possible.
uridcterms:identifierURI trueUniform Resource Identifier
identifierdcterms:identifierLiteral  
Catch-all for unambiguous identifiers not defined by qualified form; use identifier.other for a known identifier common to a local collection instead of unqualified form.
iso
  38
en 
dc.language.iso
Current ISO standard for language of intellectual content, including country codes (e.g. "en_US").
  70dc.language.rfc3066the rfc3066 form of the language for the item  37dc.language
Catch-all for non-ISO forms of the language of the item, accommodating harvested values.
 Entity responsible for publication, distribution, or imprint. 39dc.publisher
URI   
44
dc.relation.haspart 47dc.relation.isbasedonReferences source.
References physically or logically contained item.
relationdcterms:relationURI  
46dc.relation.hasversionReferences later version. 
 Catch-all for references to other related items.
meshdcterms:subjectURI 
  
41dc.relation.isformatofReferences additional physical form.
MEdical Subject Headings
otherdcterms:subjectLiteral  
42dc.relation.ispartofReferences physically or logically containing item.
 Local controlled vocabulary; global vocabularies will receive specific qualifier.
subjectdcterms:subjectLiteral  
43
 
dc.relation.ispartofseriesSeries name and number within that series, if available.
Uncontrolled index term.
alternativedcterms:titleLiteral  
48dc.relation.isreferencedbyPointed to by referenced resource.
 Varying (or substitute) form of title proper appearing in item, e.g. abbreviation or translation
titledcterms:titleLiteral  
51dc.relation.isreplacedby
trueTitle statement/title proper.
typedcterms:typeClass
References succeeding item.
  
45dc.relation.isversionofReferences earlier version. 
Nature or genre of content.
..................

Item Metadata Profile "Simple Item"

The second Item profile exemplifies a simple item with a smaller set of fields allowed, but with stricter requirements for populating those fields.

FieldrefinesencodingdefaultrequiredScope Note
issueddcterms:dateW3CDTF${now}trueDate of publication or distribution.
uridcterms:identifierURI trueUniform Resource Identifier
languagedcterms:languageRFC5646entrueCatch-all for non-ISO forms of the language of the item, accommodating harvested values.
meshdcterms:subjectURI  trueMEdical Subject Headings
titledcterms:titleLiteral  trueTitle statement/title proper.
typedcterms:typeClass trueNature or genre of content.

Steps To getting There

  • New fields and tables are added to database.
  • New attributes for existing DC schema and addition of the DCTerms Schema should be added to Registry after it has been extended.
  • Creation of several "Item Profiles" that can exemplify different types of Items in DSpace. each should utilize the new DCTERMS Schema where-ever possible.
  • Update DSpace build process to populate any necessary fields in new MetadataField and Profile tables.
  • Improve User interface and DSO data model to include returning details pertaining to Profile types for informing the User interface
  • Creation of new Describe Step and ItemEdit interfaces that enforce validation requirements expressed in the Metadata Profile
  • Creation of MetadataProfile Administrative Interfaces for managing Profiles.

Summary

The above proposal clarifies that new capabilities may emerge for "Typing" , "Restriction" and "Validation" of DSpace objects through extension of the existing data model.  The proposed strategy will support stronger typing of not only DSpaceObejcts, but also the values of metadata fields through validation rules such as syntax or vocabulary encodings, requiredness, Dublin Core or other metadata schema types.  DSpace should be able to utilize the new MetadataProfileRegistry as a means to replace large portions of the functionally found in the input-forms.xml file in future DSpace versions.

 50dc.relation.replacesReferences preceeding item.  49dc.relation.requiresReferenced resource is required to support function, delivery, or coherence of item.  52dc.relation.uriReferences Uniform Resource Identifier for related item.  40dc.relationCatch-all for references to other related items.  71dc.rights.holderThe owner of the copyright  54dc.rights.uriReferences terms governing use and reproduction.  53dc.rightsTerms governing use and reproduction.  56dc.source.uriDo not use; only for harvested metadata.  55dc.sourceDo not use; only for harvested metadata.  58dc.subject.classificationCatch-all for value from local classification system; global classification systems will receive specific qualifier  59dc.subject.ddcDewey Decimal Classification Number  60dc.subject.lccLibrary of Congress Classification Number  61dc.subject.lcshLibrary of Congress Subject Headings  62dc.subject.meshMEdical Subject Headings  63dc.subject.otherLocal controlled vocabulary; global vocabularies will receive specific qualifier.  57dc.subjectUncontrolled index term.  65dc.title.alternativeVarying (or substitute) form of title proper appearing in item, e.g. abbreviation or translation  64dc.titleTitle statement/title proper.  66dc.typeNature or genre of content.