Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

<?xml version="1.0" encoding="utf-8"?>
<html>
This page documents the new choice management and
authority control of Item ("DC") metadata values features in DSpace + 1.6.
These are actually two separate features: the choice management
mechanism is completely independent of authority control and can be
configured and deployed independently of it.

If you are wondering what authority control is and why it is important,
please see Dorothea Salo's comprehensive paper,
Name authority control in institutional repositories.

Note that this is a general mechanism for authority control of
any metadata field:
Although author (and other personal) names are the most common (and most
urgently needed) example, the prototype can be applied to any of the DC fields.
It requires some minor, backward-compatible changes to the DSpace 1.5
data model and API. A prototype implementation is available for DSpace + 1.6. Image Removed

Introduction and Motivation

Definitions

  • Choice Management

This is a mechanism that generates a list of choices for a value to be entered in a given metadata field.

...

Depending on your implementation, the exact choice list might be determined by a proposed value or query, or it could be a fixed list that is the same for every query.

...

It may also be closed (limited to choices produced internally) or open, allowing the user-supplied query to be included as a choice.

  • Authority Control

This works in addition to choice management to supply an authority key along with the chosen value, which is also assigned to the Item's metadata field entry. Any authority-controlled field is also inherently choice-controlled.

About Authority Control

The advantages we seek from an authority controlled metadata field are:

  1. There is a simple and positive way to test whether two values are identical, by comparing authority keys.
    • Comparing plain text values can give false positive results e.g. when two different people have a name that is written the same.
    • It can also give false negative results when the same name is written different ways, e.g. "J. Smith" vs. "John Smith".
  2. Help in entering correct metadata values. The submission and admin UIs may call on the authority to check a proposed value and list possible matches to help the user select one.
  3. Improved interoperability. By sharing a name authority with another application, your DSpace can interoperate more cleanly with other applications.
    • For example, a DSpace institutional repository sharing a naming authority with the campus social network would let the social network construct a list of all DSpace Items matching the shared author identifier, rather than by error-prone name matching.
    • When the name authority is shared with a campus directory, DSpace can look up the email address of an author to send automatic email about works of theirs submitted by a third party. That author does not have to be an EPerson.
  4. Authority keys are normally invisible in the public web UIs. They are only seen by administrators editing metadata. The value of an authority key is not expected to be meaningful to an end-user or site visitor.

Authority control is different from the controlled vocabulary of keywords
already implemented in the submission UI:

  1. Authorities are external to DSpace. The source of authority control is typically an external database or network resource.
    • Plug-in architecture makes it easy to integrate new authorities without modifying any core code.
  2. This authority proposal impacts all phases of metadata management.
    • The keyword vocabularies are only for the submission UI.
    • Authority control is asserted everywhere metadata values are changed, including unattended/batch submission, LNI and SWORD package submission, and the administrative UI.

Some Terminology

  • Authority

An authority is a source of fixed values for a given domain, each unique value identified by a key.
For example, the OCLC LC Name Authority Service.

  • Authority Record

The information associated with one of the values in an authority; may include alternate spellings and equivalent forms of the value, etc.

  • Authority Key

An opaque, hopefully persistent, identifier corresponding to exactly one record in the authority.

Choice Management: Design Principles and Behavior

Choice management may be applied to any metadata field in the
DSpace configuration properties. This configuration is effective in all
community and collection contexts.

Source of Choices

You configure a metadata field for choice management by selecting a
choice authority plugin for it. This plugin serves as a source of
choices. Whenever the user is entering a metadata value for that field,
e.g. in the interactive submission UI or when editing the Item's
metadata, the UI consults that choice authority plugin to get a list
of available choices to present to the user. This list may or may not be
affected by the current (or proposed) value of the field.

Presentation Style

You may configure a presentation style for the metadata field that
governs how the UI displays choices and interacts with the user to pick one.

The available values are:

  • lookup

User enters a proposed value and clicks a button to "look up" choices based on that value, and present a pop-up window that lets her navigate through choices.

  • suggest

As the user types in a text-input field, a menu of suggested choices is automatically generated. It acts like the Google Suggest feature.

  • select

Puts up a drop-down menu (or multi-pick selection box) of choices using the HTML SELECT widget.

...

This style should only be used for plugins that have a relatively small, and fixed set of choices.

...

It does not support authority values and should not be used for authority-controlled fields.

Authority Control: Design Principles and Behavior

  1. Not a replacement for text metadata value. Metadata fields still have text values.
    • The text value of a metadata field does not have to be derived from the authority, even if authority control is required for that field.
  2. Configured by field. The authority control status of each field is independently configured, but it affects all values of that field.
  3. *Authority control can be optional or required. * When optional, metadata values may take on values that did not come from the authority.
  4. Authority values are ubiquitous. Authority values are accessible by crosswalk plugins, in the UI, through OAI-PMH, etc.
    • All of those context can detect whether a value is authority-controlled or not by testing for presence of an authority key.
  5. Text-based searching and indexing is unchanged. Since metadata values still have text values, the browse and search systems will work unchanged.
  6. Choice behavior decoupled. The selection and choice mechanisms can be invoked independently (e.g. in the submit UI) of authority control.

...

When collecting a value for an authority-controlled field,
the interactive submission UI has to help the user choose a value from the
authority set. Typically the user enters a clue or partial value and is
then presented with a list of matches from which to choose. Each
potential answer may include not only the value of the metadata field, but
also some associated information that helps discriminate between identical
values. For example, an authority on personal names might include title,
department, age, and other details to help the user choose between two
records with identical names.

An authority service must be able to provide both search and
browse functions, the former returning all authority records
matching a proposed value, and the latter used to populate menus of
choices. (It is not always practical to browse choices, since there may
be a large number of them, so some fields may only offer the search
option.)

Unattended Submission

All of the batch and package-oriented submission methods are considered
unattended since there is no provision for a user interface to get
a person to make decisions. Metadata values are typically assigned by
crosswalks which translate from external MD formats to DSpace's "Dublin
Core"-style fields.

When the crosswalk adds a metadata value to an authority-controlled field
in an Item, the authority is given a chance to test the value. It returns
both an authority key and a confidence value, which is a measure of
how well the authority key (if any) represents the metadata value. There
are more details in the API section below.

The crosswalk can also be coded to look up an authority key itself,
and supply it explicitly
when setting the metadata value. In this case it determines
the authority key (and confidence level) that gets set in the Item.
For example, if you
implement a SWORD client that consults the same name authority as DSpace
for personal names, it can just pass the authority key through the
encoded metadata to compatible crosswalk that puts that key directly into
the appropriate Item metadata field.

Otherwise, when ingesting a raw text value into an authority controlled field,
the unattended environment must be prepared for any of these conditions:

...

The exact handling of the various levels of failure for each kind of
metadata field is a matter of policy that is ultimately decided by the
crosswalk implementation. It is influenced by the confidence
metric provided by the authority plugin. The authority can advise the
crosswalk when to declare an error, or just accept a dubious match.

...

The administration UI includes a page that lets administrators edit
an Item's "Dublin Core" metadata, changing values and adding fields.
It must
also enforce authority control on the fields for which it is configured.
When an authority key is available, it is shown in the UI as a read-only
text field. Fields with authority control also present
a "Lookup" button which invokes a generic UI to solicit
a value by taking clues to look up and displaying matches from the authority.
The free-text value of the field may also be changed independently
from the authority key.

Batch Metadata Editing

See Batch + Metadata + Editing + Feature for details about this proposal.
The implications are essentially the same as for unattended submission:
it can apply authority keys directly when available, otherwise do
(explicitly or implicitly) a lookup of values set on
authority-controlled fields.

...

When an unattended submission or edit leaves an
authority-controlled field with a problem,
e.g. an ambiguous or unidentified value,
there are two ways to correct it:

...

We will consider adding an administrative UI to detect and list
these metadata problems.

Display and Crosswalk

Every metadata value potentially includes values for an
authority key, and an authority confidence. They are only
present when the field is under authority control, and when there is an
actual authority value, otherwise they are absent (e.g. in the DIM XML
representation) or null.

The presentation UI can call on a generic method to get the canonical
display string for an authority key, but it is welcome to interpret it
in custom code to present a more detailed view. For example, one
site may want to customize their Item display so a personal name
appears with a link to their page on the institution's social networking site,
which it obtains through the authority key.

...