Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Panel

Contents

Table of Contents
outlinetrue
stylenone

Support for Multiple Schemas

A simple enhancement proposal to add support for multiple (flat) schemas.
Current assignee: Martin Hald. Basic implemenation complete, alpha version available as http://sourceforge.net/tracker/index.php?func=detail&aid=1242663&group_id=19984&atid=319984 a patch on SourceForge.

The basic idea is to generalise the code and DB tables around Dublin Core. Conceptually speaking, a new column is added to

Code Block
DCTypeegistry

, and the same mechanisms used for Dublin Core may be used for other schemas. I believe Jim+Downing has already implemented something along these lines.

Backwards compatibility is important and not actually too difficult. Most of the 'under the hood' changes can be made with no need to change any UI code or other code in

Code Block
org.dspace.app.*

.

Database changes

From a backwards-compatibility point of view, keeping the table names the same might be easiest. However, changes to table names are masked from most code through the org.dspace.content.Item API. We may be able to use D views for other cases. There would need to be refactoring in a couple of other areas, but from an architectural consistency and code manageability / understandability viewpoint, I think this would be worth it. So that's what I've assumed below.

A new table,

Code Block
MetadataSchemaRegistry

is added, and

Code Block
DCTypeRegistry

is renamed

Code Block
MetadataFieldRegistry

and modified to relate to the schema registry. Note the UNIQUE constraint on element / qualifier is removed (can easily see >1 schema having "title").

Code Block
 -------------------------------------------------------
 -- MetadataSchemaRegistry table
 -------------------------------------------------------
 CREATE TABLE MetadataSchemaRegistry
 (
   metadata_schema_id INTEGER PRIMARY KEY,
   namespace          VARCHAR(256),
   short_id           VARCHAR(32)    -- e.g. 'dc'
 );
 -------------------------------------------------------
 -- MetadataFieldRegistry table
 -------------------------------------------------------
 CREATE TABLE MetadataFieldRegistry
 (
   metadata_field_id   INTEGER PRIMARY KEY,
   metadata_schema_id  INTEGER REFERENCES MetadataSchemaRegistry(metadata_schema_id),
   element             VARCHAR(64),
   qualifier           VARCHAR(64),
   scope_note          TEXT
 );
Code Block
DCValue

would be renamed to

Code Block
MetadataValue

, but remain the same. (Note that

Code Block
source_id

is removed since it's an architectural relic.)

Code Block
 -------------------------------------------------------
 -- MetadataValue table
 -------------------------------------------------------
 CREATE TABLE MetadataValue
 (
   metadata_value_id  INTEGER PRIMARY KEY,
   item_id            INTEGER REFERENCES Item(item_id),
   metadata_field_id  INTEGER REFERENCES DCTypeRegistry(dc_type_id),
   text_value         TEXT,
   text_lang          VARCHAR(24),
   place              INTEGER
 );

We can create a view

Code Block
DCValue

for backwards compatibility:

Code Block
 -------------------------------------------------------
 -- DCValue view
 -------------------------------------------------------
 CREATE VIEW DCValue AS
   SELECT MetadataValue.*
   FROM MetadataValue, MetadataFieldRegistry
   WHERE MetadataValue.metadata_field_id = MetadataFieldRegistry.metadata_field_id
   AND MetadataFieldRegistry.metadata_schema_id = 1;

We could define '1' as a special value for

Code Block
metadata_schema_id

for Dublin Core. (Can we make

Code Block
metadata_value_id

appear as

Code Block
dc_value_id

? Not that it probably matters.)

Code Changes

By definition anything in the DSpace application/interface layer

Code Block
org.dspace.app

won't be affected as it is using the

Code Block
org.dspace.content.Item.getDC

method. Of course additional functionality will be needed in the UI (administration UI etc.) to realise the schema support but everything should work as before when the relevant changes are made elsewhere. Care will be needed to do everything in a way that doesn't impact performance. (Don't want to add to ScalabilityIssues1.4!)

Code Block

org.dspace.administer

New class

Code Block
MetadataSchema

. Very much along the lines of existing

Code Block
DCType

and other DSpace Java objects. _Maybe belongs in

org.dspace.content

?_

Code Block
DCType

becomes

Code Block
MetadataField

.

Code Block
getMetadataSchema

and

Code Block
setMetadataSchema

methods added.

Code Block
loadDC

needs updating (see below).
Maybe create a backwards-compatible class

Code Block
DCType

?

Code Block

org.dspace.content

  • Code Block
    org.dspace.content.Item
    will need a few changes.
    Code Block
    getDC()/setDC()
    etc. need to work exactly as before. Not difficult. It will also need some extra get/set methods for the
    Code Block
    MetadataField
    (and maybe
    Code Block
    MetadataSchema
    ?).
  • New class
    Code Block
    MetadataValue
    : identical to
    Code Block
    DCValue
    , except with a
    Code Block
    MetadataSchema
    value. Can make
    Code Block
    DCValue
    a subclass of
    Code Block
    MetadataValue
    for backwards-compatibility.

Code Block

org.dspace.search

Will work with DC with no changes as it uses APIs and not direct D access. Will need to be modified to use new metadata schema values.

Code Block
dspace.cfg

search parameters can be changed to index new schemas, e.g.:

Code Block
 search.index.1 = author:dc.contributor.*
 search.index.2 = author:dc.creator.*
 search.index.3 = title:dc.title.*
 search.index.4 = medium:vracore.material.medium

Code Block

org.dspace.browse

Code and table views will need alteration.

Other changes

  • Format of
    Code Block
    [dspace]config/registries/dublin-core-types.xml
    will need to be altered. (Perhaps even can use XSLT for backwards-compatilibity (wink) )
  • Custom submission form stuff could be extended to take advantage of this new stuff
  • Batch importer/exporter – should be easy to retain backwards compatilibity (with "
    Code Block
    dublin_core.xml
    " clearly indicating Dublin Core, can use different filename for other metadata)
  • History system??
  • UI changes to item display page