Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Not a replacement for text metadata value. Metadata fields still have text values.
    • The text value of a metadata field does not have to be derived from the authority, even if authority control is required for that field.
  2. Configured by field. The authority control status of each field is independently configured, but it affects all values of that field.
  3. *Authority control can be optional or required. * When optional, metadata values may take on values that did not come from the authority.
  4. Authority values are ubiquitous. Authority values are accessible by crosswalk plugins, in the UI, through OAI-PMH, etc.
    • All of those context can detect whether a value is authority-controlled or not by testing for presence of an authority key.
  5. Text-based searching and indexing is unchanged. Since metadata values still have text values, the browse and search systems will work unchanged.
  6. Choice behavior decoupled. The selection and choice mechanisms can be invoked independently (e.g. in the submit UI) of authority control.

...

When collecting a value for an authority-controlled field, the interactive submission UI has to help the user choose a value from the
authority set. Typically the user enters a clue or partial value and is then presented with a list of matches from which to choose. Each potential answer may include not only the value of the metadata field, but also some associated information that helps discriminate between identical values. For example, an authority on personal names might include title, department, age, and other details to help the user choose between two records with identical names.

...

The presentation UI can call on a generic method to get the canonical
display string for an authority key, but it is welcome to interpret it
in custom code to present a more detailed view. For example, one
site may want to customize their Item display so a personal name
appears with a link to their page on the institution's social networking site,
which it obtains through the authority key.

Dissemination crosswalks will also receive the authority key so they can
pass that knowledge on through OAI-PMH, exported packages, and any other
dissemination vehicles.

Browse

There is a new field type in the Configurable Browse facility that has the
browse mechanism index Items by their authority key values instead of the
text value of the metadata field. This gives a truly authority-controlled
browse UI.

The support for authority control in search is very subtle:

...

Since the resources to design and implement this prototype are sharply
limited it is necessary to pare it down to the features essential for
our site.

No Changes to Search UI

Although search indexes are built for the authority keys of authority-controlled
fields, there is no explicit UI to access these indexes. For example, they do not appear in the
index list of Advanced Search in the XMLUI. This is appropriate since most users would not know
how to get authority key values to search on anyway.

...

The "batch" import mechanism only supports authority control implicitly, through the same
backward-compatibility mechanism that attempts to assign authority values
automatically on any unattended ingest. This implementation has no way to
include an authority value in the DC metadata, although of course this feature
can always be added later if there is enough demand and a willing developer.

Since the batch ingester has substantially been replaced
by the package ingester, which is available through the LNI and SWORD as well
as through command-line invocation, we will concentrate on adapting
crosswalk plugins, and thus the package ingester, to work with authority control.

...

The data model (as detailed below) makes the simplifying assumption that
there is one canonical displayable representation for each key describing
an entry in the metadata-value authority. In practice this is not always
the case, e.g. in an author name authority, a single identifier
can have multiple records which are considered canonical. In the oft-cited
example of an author writing under pseudonyms, both "Philip Jose Farmer" and
"Kilgore Trout" might be bound to the authority record for the same
individual, at equal levels of confidence. DSpace relies on the plug-in
implementation communication with that authority to order the choices, and
in some situations (e.g. unattended submission) it must blindly choose the first.

Data Model

Relational Tables

The basic implementation only adds two columns to the <tt>MetadataValue</tt>
MetadataValue table:

Oracle:

Panelcode

 ALTER TABLE MetadataValue


   ADD ( authority VARCHAR(100),

confidence INTEGER DEFAULT

         confidence INTEGER DEFAULT -1);

Postgres:

Panelcode

 ALTER TABLE MetadataValue


   ADD COLUMN authority VARCHAR(100),


   ADD COLUMN confidence INTEGER DEFAULT -1;

These allow an authority key and confidence metric to be associated
with each value. Note that some other significant state associated with
the authority control of a metadata field is in the DSpace Configuration
because it is a property of the software, not the data.

Some indexes will probably be needed once we get some experience with the
prototype implementation to see what the query behavior is like.

...

The key is a text string whose interpretation is left up to the
authority plugin serving that metadata field.

...

An integer describing the level of "quality", or confidence, that the authority
key is the correct and unique representation of the text value of the
metadata field. See the API for a description of its meaningful values.

The confidence value is mainly necessary because we support setting an authority value in
an unattended environment, so it shows how much "confidence" we should have in the authority value.
The act of choosing an authority entry
to match the submitted metadata value is inherently imprecise:
the profferred value might match multiple entries, or none, or the operation
might fail because an external resource is unavailable.

We do not want the entire Item ingestion to fail (at least, not always)
because of what may be
a minor problem in the metadata. However, we also do not want to
let incorrect or incomplete metadata get recorded, unremarked. The solution
is to add a mechanism to grade the metadata, so that we can detect
problems and direct human operators to fix them. The confidence metric
is that mechanism, upon which we can implement any policy.

...

There are symbolic constants (and corresponding String symblic names) for the confidence levels defined in the <tt>Choices</tt> Choices class:

  • ACCEPTED
    - This authority value has been confirmed as accurate by an interactive user or authoritative policy
  • UNCERTAIN
    - Authority value is singular and valid but has not been seen and accepted by a human, so its provenance is uncertain
  • AMBIGUOUS
    - There are multiple matching authority values of equal validity
  • NOTFOUND
    - There are no matching answers from the authority
  • FAILED
    - The authority encountered an internal failure in trying to match the value
  • REJECTED
    - The authority recommends this submission be rejected
  • NOVALUE
    - No reasonable confidence value is available
  • UNUSED
    - No confidence value has been set (default value in the DB table)

Separation of Choices from Authority Control

The prototype implementation has a feature that was not in the original
design proposal: a choices mechanism that is distinct from the
machinery of authority control. This is an advantage because:

...

All configuration is determined by MetadataField:
it is the field that gets declared as the object of a particular
set of choices, or as authority-controlled. This ensures the field has
uniform treatment throughout the DSpace platform, e.g. in browse indexing,
submission, dissemination, etc.

User Interface

Public (Artifact Browser) UI

Choice control has no visible effect on the public UI, except that metadata values of choice-controlled fields will be restricted to the controlled set. This can help to clean up and normalize indexes for browsing or crosswalking/interfacing to other systems.

...

  1. Browse indexes can be configured as authority-controlled and this gain the benefits of authority keys.
  2. In XMLUI, the "full" metadata view of an Item shows a confidence icon next to a metadata value with a non-empty authority key.

Submission UI

Fields configured with choice plugins will appear on the submission "Describe" pages as dictated by their chosen presentation style.

Fields configured as authority-controlled will also display a Lookup button (or Lookup and Add for
repeatable fields), and a confidence icon will appear when an authority key has been determined.

Administrative UI

The "Edit Item Metadata" page is affected as follows:

...

Fortunately, the OpenSearch API lets you submit a query directly to the Lucene search engine, and this
may include the authority-controlled indexes.

...

For this example, suppose the DC metadata field

...

dc.contributor.author

...

is authority-controlled.
The search index

...

author

...

is configured by default to include
the fields <tt>dc.contributor.*</tt>, so it effectively inherits authority control.
Thus, there will be a separate search index

...

author_authority

...

created for the authority keys. Note that only the Items with authority key values are represented there, it will probably be a subset of the

...

author

...

index.

To search on this index, you would submit an OpenSearch query such as this to retrieve Items with an author whose authority key matches <tt>no2004117088</tt>:

Panelcode

 {noformat
}http://dspace.myuni.edu/open-search/?query=author_authority:no2004117088{noformat}

Obtaining Authority Keys

How do you get the authority key value on which to search? If it comes from an external source (e.g. the Library of Congress Naming Authority), and you happen to know the details of how it is derived, you can derive it directly from the source.
Otherwise, you can use the Browse UI on an authority-controlled browse index to display a list of authority keys in the first-order browse list. Each value is a link whose URL is of the form:

Panelcode

 {noformat
}http://dspace.myuni.edu/browse?
{noformat}type=*BROWSE_INDEX_NAME*&amp;authority=*AUTHORITY_KEY*

So you can extract the authority key value right out of the URL.

...

The Choice Management and Authority Control subsystems are just a framework.
Without configuration, they have no effect on the operation of your
DSpace site. The desired behavior for each metadata field is driven
entirely by the DSpace Configuration properties and the nature of the
chosen ChoiceAuthority plugin.

...

Since choice management and authority control affect the operation of
the interactive submission pages, there is naturally some interdependence
with that configuration as well. This table shows how the data type
chosen for a metadata field affects choice and authority management:

...

Not every ChoiceAuthority plugin can present the select
presentation style. The plugin must be able to respond with a
complete list of choices even when no query value was specified. If that
is not possible (e.g. it searches a network resource with tens
of thousands of journal titles), then it should not be configured
with the "select" style.

"Name" input type

A plugin intended to manage choices for a personal-name field must be
coded to expect its query value in the DSpace canonical name format, i.e.
"Last, Firsts", e.g. "Doe, John", "Adams, John Quincy", "King, Martin Luther Jr.".

...

First, configure all your ChoiceAuthority plugins in the usual PluginManager
style. The example plugins provided in the DSpace code are configured here:

Panelcode

 plugin.named.org.dspace.content.authority.ChoiceAuthority = \


  org.dspace.content.authority.SampleAuthority = Sample, \


  org.dspace.content.authority.LCNameAuthority = LCNameAuthority, \


  org.dspace.content.authority.SHERPARoMEOPublisher = SRPublisher, \


  org.dspace.content.authority.SHERPARoMEOJournalTitle = SRJournalTitle

Automatic Choice Authority from Configurable Submission value-pairs

The default configuration also includes a special self-named plugin that
picks up all the value-pairs elements defined in your
*

...

input-forms.xml

...

* configuration and makes them available
as choice authorities (especially suitable for the select presentation style(smile)

Panelcode

 plugin.selfnamed.org.dspace.content.authority.ChoiceAuthority = \


  org.dspace.content.authority.DCInputAuthority

Some of the ChoiceAuthority instances available in the default configuration are:

  • LCNameAuthority
    - Sample Library of Congress (USA) name authority - NOT for serious use.
  • SRPublisher
    - Journal Publisher names based on SHERPA/RoMEO database
  • SRJournalTitle
    - Journal Titles based on SHERPA/RoMEO database

The

...

org.dspace.content.authority.DCInputAuthority

...

plugin picks up all of the

...

value-pairs

...

tags from the <tt>input-forms.xml</tt> configuratino and automatically creates plugin instances out of them, named by the name attribute they have in the config. The default DSpace config includes these choice authorities:

  • common_types
    - List of dc.type values from Code Block input-forms.xml
  • common_iso_languages
    - List of dc.language.iso values from Code Block input-forms.xml
  • common_identifiers
    - List of dc.identifier.X qualifiers from Code Block input-forms.xml

Selecting the choice plugin

First, any authority-controlled field must also be configured with
a source of choices. This is also defines a simple choice field:

Panelcode

 choices.plugin._schema.element.qualifier_ = _plugin-name_

e.g.

Panelcode

 choices.plugin.dc.relation.journal = SRJournalTitle

Selecting Choice Presentation Style

This determines the UI presentation of the choice (mainly in the interactive
submission UI).

Panelcode

 choices.presentation._schema.element.qualifier_ = select | suggest | lookup

e.g.

Panelcode

 choices.presentation.dc.relation.journal = suggest

The available values are:

  • <tt>lookup</tt>
    - User enters a proposed value and clicks a button to "look up" choices based on that value, and present a pop-up window that lets her navigate through choices.
  • <tt>suggest</tt>
    - As the user types in a text-input field, a menu of suggested choices is automatically generated. It acts like the Google Suggest feature.
  • <tt>select</tt>
    - Puts up a drop-down menu (or multi-pick selection box) of choices using the HTML SELECT widget.

...

Finally, the choices for a metadata field may be specified as open (i.e.
values not included in the choices are allowed) or closed (restricted to
the set of values offered. The default is open. This means that
when a proposed value is not already in the choices, it is added
as an allowable choice.

Panelcode

 choices.closed._schema.element.qualifier_ = true | false

e.g.

Panelcode

 choices.closed.dc.relation.journal = false

Authority Control configuration

...

To declare a field as authority-controlled, just add a property
like this, in addition to its Choices plugin declaration:

Panelcode

 authority.controlled._schema.element.qualifier_ = true

e.g.

Panelcode

 authority.controlled.dc.relation.journal = true

Requiring Authority Value

To further constrain an authority-controlled field so that it must
have an authority key whenever setting a metadata value, add the property:

Panelcode

 authority.required._schema.element.qualifier_ = true

CAUTION: Making an authority required might cause an unexpected error if
that metadata field is set to a value for which the choices plugin
cannot find any authority keys.

...

  • accepted
  • uncertain
  • ambiguous
  • notfound
  • failed
  • rejected
  • novalue
  • unset

For example:

Panelcode

  authority.minconfidence.dc.contributor.author = accepted

Default Minimum Confidence

...

. The built-in default is <tt>ACCEPTED</tt>, but if you find this is too high, you may prefer <tt>AMBIGUOUS</tt> to give automatically-derived authority keys the benefit of the doubt, e.g.

Panelcode

  authority.minconfidence = ambiguous

Example of some field configurations

Panelcode

 # Simple configuration of authory-controlled author, authority not required


 choices.plugin.dc.contributor.author = LCNameAuthority


 choices.presentation.dc.contributor.author = lookup


 authority.controlled.dc.contributor.author = true

#As an

 #
 # As an example, get journal title for dc.title.alternative


 choices.plugin.dc.title.alternative = SRJournalTitle


 choices.presentation.dc.title.alternative = suggest

#This employs a select to restrict choices for

 #
 # This employs a select to restrict choices for dc.type field on EditItemMetadata page:


 choices.plugin.dc.type = common_types


 choices.closed.dc.type = true


 choices.presentation.dc.type = select

Customization

Adding ChoiceAuthority Plugin

You'll probably want to implement or adapt your own version of
a <tt>ChoiceAuthority</tt> plugin. See the API description for more details,
and consult the sample implementations
in the <tt>org.dspace.content.authority</tt> package.

...

For the XMLUI, the prototype includes sample styles and images
in the <tt>Reference</tt> theme.

...

You can control the images shown for various authority confidence levels
with style tags such as <tt>img.ds-authority-confidence.cf-NAME</tt>, where
NAME is the symbolic name of the confidence level, such as <tt>accepted</tt> (see above for the list). For example, this displays a thumbs-up
icon to mark a human-approved level of confidence in the authority value:

Panelcode

 img.ds-authority-confidence.cf-accepted


  \{ background: transparent url(../images/confidence/6-thumb2.gif); \}

Debugging Authority Values

You can turn on the display of authority-value fields,
ideally just for debugging since it clutters the display. Adjust the value
of the <tt>display</tt> style to <tt>inline</tt> to see authority
value fields on Submission UI forms. (Note that the Edit Item Metadata
page already displays authority values.)

Panelcode

 input.ds-authority-value \{ display: none; \}

Suggest / Autocomplete

The prototype CSS also includes some styles for the Scriptaculous JavaScript
autocomplete feature as well. These lines should be copied (and customized
customize as necessary) into your theme's CSS:

Panelcode

 div.autocomplete


 div.autocomplete ul


 div.autocomplete ul li.selected


 div.autocomplete ul li


 div.autocomplete ul li span.value

JSPUI

For customization, examine the contents of these pages in the webapp:

Panelcode

 /tools/lookup.jsp


 /tools/edit-item-form.jsp


 /submit/edit-metadata.jsp

"Lookup" page

The popup page generated for the lookup presentation style is
created by the underlying UI mechanism in both JSPUI and XMLUI. All of
the usual customization and styling tricks for that UI thus apply to it.

Prototype API

New Classes

Metadata Authority Control class

Actual authority control is mainly a matter of the
_

...

MetadataAuthorityManager

...

_ broker class, which
interprets the DSpace configuration and reports the authority status
of a field. That's about all it does.

Choice Authority Manager

The choices framework consists of a
_

...

ChoiceAuthorityManager

...

_ class which serves as a broker
and accesses the configuration, and individual plugins implementing the
_

...

ChoiceAuthority

...

_ interface. Each plugin represents a
choice authority, which is a source of value options or choices.
See the prototype code for details.

Choice Authority plugin

These are the significant methods in
the

...

ChoiceAuthority

...

interface:

getMatches

Returns the set of choices matching, i.e. possibly relevant to, a
proposed (and maybe partial) metadata value.

The exact requirements and expectations of this method's implementation depend
on how the fields that call it are configured: if the suggest
UI presentation is employed, it will get called with partial values. If only
the lookup presentation is called, it will see complete values.
Also, the suggest mechanism is not usually capable of taking "paged"
result sets (i.e. where start is greater than 0).

Note that getMatches() is given a collection argument, which
contains the owning Collection of the Item (or SubmissionItem) for which
a metadata value is being assigned. This is intended to give the plugin
some context to adjust its criteria for assembling a set of choices.
For example, a personal-name choice authority may restrict its search to
members of a certain department if the indicated collection is only intended
for works by members of that department.

The collection is supplied as a database ID in order to save the
overhead and database access which would be required to create a DSpaceObject
instance – on the assumption that it is not always going to be used, and
in fact probably pretty rarely, and that the speed of a choice request is
very important in the interactive context, the expense of querying the
collection is deferred so it is only incurred by
those implementations which really need it.

Panelcode

 public Choices getMatches(String text, int collection, int start, int limit, String locale);

getBestMatch

Gets the single "best" match (if any) of a value in the authority
to the given user value. This is also expected to return a
meaningful "confidence" expressing the the circumstances of
this match, i.e. if it is ambiguous or not.

This call is typically used only in non-interactive metadata ingests
and other contexts where there is no opportunity for an
interactive agent to choose from among options.

Note that getBestMatch() is given a collection argument, just like
the collection context given to getMatches() – see above for details.

Panelcode

 public Choices getBestMatch(String text, int collection, String locale);

getLabel

Get the canonical, human-readable label (i.e. short descriptive text)
corresponding to the authority key of a value. This is only
called for fields defined as authority-controlled; a choice plugin that
is never used for authority-controlled fields does not need to
implement this.

Panelcode

 public String getLabel(String key, String locale);

API Changes

Changes to <tt>Item</tt>

Add methods that take authority-key and confidence arguments:

Panelcode

 public void addMetadata(String schema, String element, String qualifier, String lang, String value, String authority, int confidence)


 public void addMetadata(String schema, String element, String qualifier, String lang, String[) values, String authorities(), int confidences(])

IMPORTANT NOTE on Backward Compatibility

The old

...

addMetadata

...

methods are still available, of course,
although they have a new failure mode. Whent the field is authority-controlled
they will call the field's configured

...

ChoiceAuthority

...

's

...

getBestMatch()

...

method to generate an authority key and confidence
value. If the field is configured to require an authority key,
an exception is thrown if

...

getBestMatch()

...

does't return one.

Changes to <tt>DCValue</tt>

Add fields:

Panelcode

 public String authority;


 public int confidence;

Changes to <tt>MetaDataValue</tt>

Panelcode

 public String getAuthority()


 public void setAuthority(String value)


 public int getConfidence()


 public void setConfidence(int value)

Changes to <tt>DIM</tt> XML "schema"

The DIM Field element gains two new attributes to represent the
authority and confidence (when available) of DC metadata values, e.g.

Panelcode

 <dim:field schema="dc" element="contributor" qualifier="author" *authority="n79-21164" confidence="6"*>


 Mark Twain


 </dim:field>

Changes to DRI Schema

The XMLUI's DRI schema has some attributes added to support
choice options and authority control in input fields and their values.
Since the pages in the Item Submission and Item Edit Metadata UIs effectively
round-trip all of the Item's DC metadata values through a Web page, it
is essential for the page's fields to preserve any authority key and
confidence values as well.

DRI <tt>params</tt> Element

The

...

params

...

subelement of

...

of field

...

gets some new
attributes:

...

  • authorityControlled - Boolean, true if the field is authority-controlled.
  • Code Block
    authorityRequired
    {{authorityRequired} - Boolean, true if the field requires an authority key value.code
  • choices - Name of the metadata field whose choice configuration to use. Code Block
  • choicesPresentation - Type of choices presentation style to use, either <tt>suggest</tt> or <tt>select</tt>code
  • choicesClosed - Boolean, true if choice is configured as closed, i.e. no values outside of presented choices are to be allowed.

The

...

value

...

subelement of

...

instance

...

and some

...

field

...

(e.g. text, textarea) elements gets some new
attributes:

  • Code Blocktype='authority' - This new keyword value of the type attribute means the value is an authority-key value.code
  • confidence - The given confidence number applies to this element, which should also have a type of <tt>"authority"</tt>.

...

In the <tt>org.dspace.content.authority</tt> Package

...

  • Choice - record class for a single choice valuecode
  • Choices - record class for result of choices querycode
  • ChoiceAuthority - interface of choice authority plugin
  • Code BlockChoiceAuthorityManager - choices factory and accesscode
  • MetadataAuthorityManager - metadata authority control factory

...

  • New <tt>/choices/field</tt> URL added to sitemap to retrieve choices data as XML, for AJAX browser scripting.
    • Implemented by Code Block org.dspace.app.xmlui.cocoon.AJAXMenuGenerator
  • Show icons depicting confidence level of authority-controlled metadata values, in long Item view and MD edit pages.
  • Option to show authority values in submission UI, for debugging, see Code Blockinput.ds-authority-value stanza in CSS file.

XMLUI Sample Implementation

Any Theme for a DSpace using the Choice or Authority Control features
must include the CSS declarations prototyped in the Reference theme.
The corresponding images must also be available in your theme.
All of the other elements (JavaScript, translations, etc) are implemented below
the theme layer.

Installing the Prototype

...

  1. Checkout from the the special "sandbox" on svn:
  2. Modify your <tt>dspace.cfg</tt> properties to enable choice and/or authority control on some fields. Unless you configure choice control, nothing will be any different. (See "Configuration" section above for details)
  3. Configure and build as usual:
    • Be sure to choose the <tt>Reference</tt> theme, otherwise you'll have to copy over the choice/authority additions to your chosen theme (see above).
  4. After the fresh install, convert the database schema to 1.6 by running the SQL statements in the file: Code BlockOracle: /dspace/etc/oracle/database_schema_15_16.sql or Code BlockPostgres: /dspace/etc/database_schema_15_16.sql
  5. Start the webserver as usual.

...

Please use the Discussion page.</html>