...
- Not a replacement for text metadata value. Metadata fields still have text values.
- The text value of a metadata field does not have to be derived from the authority, even if authority control is required for that field.
- Configured by field. The authority control status of each field is independently configured, but it affects all values of that field.
- *Authority control can be optional or required. * When optional, metadata values may take on values that did not come from the authority.
- Authority values are ubiquitous. Authority values are accessible by crosswalk plugins, in the UI, through OAI-PMH, etc.
- All of those context can detect whether a value is authority-controlled or not by testing for presence of an authority key.
- Text-based searching and indexing is unchanged. Since metadata values still have text values, the browse and search systems will work unchanged.
- Choice behavior decoupled. The selection and choice mechanisms can be invoked independently (e.g. in the submit UI) of authority control.
...
When collecting a value for an authority-controlled field, the interactive submission UI has to help the user choose a value from the
authority set. Typically the user enters a clue or partial value and is then presented with a list of matches from which to choose. Each potential answer may include not only the value of the metadata field, but also some associated information that helps discriminate between identical values. For example, an authority on personal names might include title, department, age, and other details to help the user choose between two records with identical names.
...
The presentation UI can call on a generic method to get the canonical
display string for an authority key, but it is welcome to interpret it
in custom code to present a more detailed view. For example, one
site may want to customize their Item display so a personal name
appears with a link to their page on the institution's social networking site,
which it obtains through the authority key.
Dissemination crosswalks will also receive the authority key so they can
pass that knowledge on through OAI-PMH, exported packages, and any other
dissemination vehicles.
Browse
There is a new field type in the Configurable Browse facility that has the
browse mechanism index Items by their authority key values instead of the
text value of the metadata field. This gives a truly authority-controlled
browse UI.
Search
The support for authority control in search is very subtle:
...
Since the resources to design and implement this prototype are sharply
limited it is necessary to pare it down to the features essential for
our site.
No Changes to Search UI
Although search indexes are built for the authority keys of authority-controlled
fields, there is no explicit UI to access these indexes. For example, they do not appear in the
index list of Advanced Search in the XMLUI. This is appropriate since most users would not know
how to get authority key values to search on anyway.
...
The "batch" import mechanism only supports authority control implicitly, through the same
backward-compatibility mechanism that attempts to assign authority values
automatically on any unattended ingest. This implementation has no way to
include an authority value in the DC metadata, although of course this feature
can always be added later if there is enough demand and a willing developer.
Since the batch ingester has substantially been replaced
by the package ingester, which is available through the LNI and SWORD as well
as through command-line invocation, we will concentrate on adapting
crosswalk plugins, and thus the package ingester, to work with authority control.
...
The data model (as detailed below) makes the simplifying assumption that
there is one canonical displayable representation for each key describing
an entry in the metadata-value authority. In practice this is not always
the case, e.g. in an author name authority, a single identifier
can have multiple records which are considered canonical. In the oft-cited
example of an author writing under pseudonyms, both "Philip Jose Farmer" and
"Kilgore Trout" might be bound to the authority record for the same
individual, at equal levels of confidence. DSpace relies on the plug-in
implementation communication with that authority to order the choices, and
in some situations (e.g. unattended submission) it must blindly choose the first.
Data Model
Relational Tables
The basic implementation only adds two columns to the <tt>MetadataValue</tt>
MetadataValue table:
Oracle:
Panelcode |
---|
ALTER TABLE MetadataValue ADD ( authority VARCHAR(100), confidence INTEGER DEFAULT confidence INTEGER DEFAULT -1); |
Postgres:
Panelcode |
---|
ALTER TABLE MetadataValue ADD COLUMN authority VARCHAR(100), ADD COLUMN confidence INTEGER DEFAULT -1; |
These allow an authority key and confidence metric to be associated
with each value. Note that some other significant state associated with
the authority control of a metadata field is in the DSpace Configuration
because it is a property of the software, not the data.
Some indexes will probably be needed once we get some experience with the
prototype implementation to see what the query behavior is like.
...
The key is a text string whose interpretation is left up to the
authority plugin serving that metadata field.
...
An integer describing the level of "quality", or confidence, that the authority
key is the correct and unique representation of the text value of the
metadata field. See the API for a description of its meaningful values.
The confidence value is mainly necessary because we support setting an authority value in
an unattended environment, so it shows how much "confidence" we should have in the authority value.
The act of choosing an authority entry
to match the submitted metadata value is inherently imprecise:
the profferred value might match multiple entries, or none, or the operation
might fail because an external resource is unavailable.
We do not want the entire Item ingestion to fail (at least, not always)
because of what may be
a minor problem in the metadata. However, we also do not want to
let incorrect or incomplete metadata get recorded, unremarked. The solution
is to add a mechanism to grade the metadata, so that we can detect
problems and direct human operators to fix them. The confidence metric
is that mechanism, upon which we can implement any policy.
...
There are symbolic constants (and corresponding String symblic names) for the confidence levels defined in the <tt>Choices</tt> Choices class:
- ACCEPTED
- This authority value has been confirmed as accurate by an interactive user or authoritative policy - UNCERTAIN
- Authority value is singular and valid but has not been seen and accepted by a human, so its provenance is uncertain - AMBIGUOUS
- There are multiple matching authority values of equal validity - NOTFOUND
- There are no matching answers from the authority - FAILED
- The authority encountered an internal failure in trying to match the value - REJECTED
- The authority recommends this submission be rejected - NOVALUE
- No reasonable confidence value is available - UNUSED
- No confidence value has been set (default value in the DB table)
Separation of Choices from Authority Control
The prototype implementation has a feature that was not in the original
design proposal: a choices mechanism that is distinct from the
machinery of authority control. This is an advantage because:
...
All configuration is determined by MetadataField:
it is the field that gets declared as the object of a particular
set of choices, or as authority-controlled. This ensures the field has
uniform treatment throughout the DSpace platform, e.g. in browse indexing,
submission, dissemination, etc.
User Interface
Public (Artifact Browser) UI
Choice control has no visible effect on the public UI, except that metadata values of choice-controlled fields will be restricted to the controlled set. This can help to clean up and normalize indexes for browsing or crosswalking/interfacing to other systems.
...
- Browse indexes can be configured as authority-controlled and this gain the benefits of authority keys.
- In XMLUI, the "full" metadata view of an Item shows a confidence icon next to a metadata value with a non-empty authority key.
Submission UI
Fields configured with choice plugins will appear on the submission "Describe" pages as dictated by their chosen presentation style.
Fields configured as authority-controlled will also display a Lookup button (or Lookup and Add for
repeatable fields), and a confidence icon will appear when an authority key has been determined.
Administrative UI
The "Edit Item Metadata" page is affected as follows:
...
Fortunately, the OpenSearch API lets you submit a query directly to the Lucene search engine, and this
may include the authority-controlled indexes.
...
For this example, suppose the DC metadata field
...
dc.contributor.author
...
is authority-controlled.
The search index
...
author
...
is configured by default to include
the fields <tt>dc.contributor.*</tt>, so it effectively inherits authority control.
Thus, there will be a separate search index
...
author_authority
...
created for the authority keys. Note that only the Items with authority key values are represented there, it will probably be a subset of the
...
author
...
index.
To search on this index, you would submit an OpenSearch query such as this to retrieve Items with an author whose authority key matches <tt>no2004117088</tt>:
Panelcode |
---|
{noformat }http://dspace.myuni.edu/open-search/?query=author_authority:no2004117088{noformat} |
Obtaining Authority Keys
How do you get the authority key value on which to search? If it comes from an external source (e.g. the Library of Congress Naming Authority), and you happen to know the details of how it is derived, you can derive it directly from the source.
Otherwise, you can use the Browse UI on an authority-controlled browse index to display a list of authority keys in the first-order browse list. Each value is a link whose URL is of the form:
Panelcode |
---|
{noformat }http://dspace.myuni.edu/browse? {noformat}type=*BROWSE_INDEX_NAME*&authority=*AUTHORITY_KEY* |
So you can extract the authority key value right out of the URL.
...
The Choice Management and Authority Control subsystems are just a framework.
Without configuration, they have no effect on the operation of your
DSpace site. The desired behavior for each metadata field is driven
entirely by the DSpace Configuration properties and the nature of the
chosen ChoiceAuthority plugin.
...
Since choice management and authority control affect the operation of
the interactive submission pages, there is naturally some interdependence
with that configuration as well. This table shows how the data type
chosen for a metadata field affects choice and authority management:
...
Not every ChoiceAuthority plugin can present the select
presentation style. The plugin must be able to respond with a
complete list of choices even when no query value was specified. If that
is not possible (e.g. it searches a network resource with tens
of thousands of journal titles), then it should not be configured
with the "select" style.
"Name" input type
A plugin intended to manage choices for a personal-name field must be
coded to expect its query value in the DSpace canonical name format, i.e.
"Last, Firsts", e.g. "Doe, John", "Adams, John Quincy", "King, Martin Luther Jr.".
...
First, configure all your ChoiceAuthority plugins in the usual PluginManager
style. The example plugins provided in the DSpace code are configured here:
Panelcode |
---|
plugin.named.org.dspace.content.authority.ChoiceAuthority = \ org.dspace.content.authority.SampleAuthority = Sample, \ org.dspace.content.authority.LCNameAuthority = LCNameAuthority, \ org.dspace.content.authority.SHERPARoMEOPublisher = SRPublisher, \ org.dspace.content.authority.SHERPARoMEOJournalTitle = SRJournalTitle |
Automatic Choice Authority from Configurable Submission value-pairs
The default configuration also includes a special self-named plugin that
picks up all the value-pairs elements defined in your
*
...
input-forms.xml
...
* configuration and makes them available
as choice authorities (especially suitable for the select presentation style
Panelcode |
---|
plugin.selfnamed.org.dspace.content.authority.ChoiceAuthority = \ org.dspace.content.authority.DCInputAuthority |
Some of the ChoiceAuthority instances available in the default configuration are:
- LCNameAuthority
- Sample Library of Congress (USA) name authority - NOT for serious use. - SRPublisher
- Journal Publisher names based on SHERPA/RoMEO database - SRJournalTitle
- Journal Titles based on SHERPA/RoMEO database
The
...
org.dspace.content.authority.DCInputAuthority
...
plugin picks up all of the
...
value-pairs
...
tags from the <tt>input-forms.xml</tt> configuratino and automatically creates plugin instances out of them, named by the name attribute they have in the config. The default DSpace config includes these choice authorities:
- common_types
- List of dc.type values fromCode Block input-forms.xml
- common_iso_languages
- List of dc.language.iso values fromCode Block input-forms.xml
- common_identifiers
- List of dc.identifier.X qualifiers fromCode Block input-forms.xml
Selecting the choice plugin
First, any authority-controlled field must also be configured with
a source of choices. This is also defines a simple choice field:
Panelcode |
---|
choices.plugin._schema.element.qualifier_ = _plugin-name_ |
e.g.
Panelcode |
---|
choices.plugin.dc.relation.journal = SRJournalTitle |
Selecting Choice Presentation Style
This determines the UI presentation of the choice (mainly in the interactive
submission UI).
Panelcode |
---|
choices.presentation._schema.element.qualifier_ = select | suggest | lookup |
e.g.
Panelcode |
---|
choices.presentation.dc.relation.journal = suggest |
The available values are:
- <tt>lookup</tt>
- User enters a proposed value and clicks a button to "look up" choices based on that value, and present a pop-up window that lets her navigate through choices. - <tt>suggest</tt>
- As the user types in a text-input field, a menu of suggested choices is automatically generated. It acts like the Google Suggest feature. - <tt>select</tt>
- Puts up a drop-down menu (or multi-pick selection box) of choices using the HTML SELECT widget.
...
Finally, the choices for a metadata field may be specified as open (i.e.
values not included in the choices are allowed) or closed (restricted to
the set of values offered. The default is open. This means that
when a proposed value is not already in the choices, it is added
as an allowable choice.
Panelcode |
---|
choices.closed._schema.element.qualifier_ = true | false |
e.g.
Panelcode |
---|
choices.closed.dc.relation.journal = false |
Authority Control configuration
...
To declare a field as authority-controlled, just add a property
like this, in addition to its Choices plugin declaration:
Panelcode |
---|
authority.controlled._schema.element.qualifier_ = true |
e.g.
Panelcode |
---|
authority.controlled.dc.relation.journal = true |
Requiring Authority Value
To further constrain an authority-controlled field so that it must
have an authority key whenever setting a metadata value, add the property:
Panelcode |
---|
authority.required._schema.element.qualifier_ = true |
CAUTION: Making an authority required might cause an unexpected error if
that metadata field is set to a value for which the choices plugin
cannot find any authority keys.
...
- accepted
- uncertain
- ambiguous
- notfound
- failed
- rejected
- novalue
- unset
For example:
Panelcode |
---|
authority.minconfidence.dc.contributor.author = accepted |
Default Minimum Confidence
...
. The built-in default is <tt>ACCEPTED</tt>, but if you find this is too high, you may prefer <tt>AMBIGUOUS</tt> to give automatically-derived authority keys the benefit of the doubt, e.g.
Panelcode |
---|
authority.minconfidence = ambiguous |
Example of some field configurations
Panelcode |
---|
# Simple configuration of authory-controlled author, authority not required choices.plugin.dc.contributor.author = LCNameAuthority choices.presentation.dc.contributor.author = lookup authority.controlled.dc.contributor.author = true #As an # # As an example, get journal title for dc.title.alternative choices.plugin.dc.title.alternative = SRJournalTitle choices.presentation.dc.title.alternative = suggest #This employs a select to restrict choices for # # This employs a select to restrict choices for dc.type field on EditItemMetadata page: choices.plugin.dc.type = common_types choices.closed.dc.type = true choices.presentation.dc.type = select |
Customization
Adding ChoiceAuthority Plugin
You'll probably want to implement or adapt your own version of
a <tt>ChoiceAuthority</tt> plugin. See the API description for more details,
and consult the sample implementations
in the <tt>org.dspace.content.authority</tt> package.
...
For the XMLUI, the prototype includes sample styles and images
in the <tt>Reference</tt> theme.
...
You can control the images shown for various authority confidence levels
with style tags such as <tt>img.ds-authority-confidence.cf-NAME</tt>, where
NAME is the symbolic name of the confidence level, such as <tt>accepted</tt> (see above for the list). For example, this displays a thumbs-up
icon to mark a human-approved level of confidence in the authority value:
Panelcode |
---|
img.ds-authority-confidence.cf-accepted \{ background: transparent url(../images/confidence/6-thumb2.gif); \} |
Debugging Authority Values
You can turn on the display of authority-value fields,
ideally just for debugging since it clutters the display. Adjust the value
of the <tt>display</tt> style to <tt>inline</tt> to see authority
value fields on Submission UI forms. (Note that the Edit Item Metadata
page already displays authority values.)
Panelcode |
---|
input.ds-authority-value \{ display: none; \} |
Suggest / Autocomplete
The prototype CSS also includes some styles for the Scriptaculous JavaScript
autocomplete feature as well. These lines should be copied (and customized
customize as necessary) into your theme's CSS:
Panelcode |
---|
div.autocomplete div.autocomplete ul div.autocomplete ul li.selected div.autocomplete ul li div.autocomplete ul li span.value |
JSPUI
For customization, examine the contents of these pages in the webapp:
Panelcode |
---|
/tools/lookup.jsp /tools/edit-item-form.jsp /submit/edit-metadata.jsp |
"Lookup" page
The popup page generated for the lookup presentation style is
created by the underlying UI mechanism in both JSPUI and XMLUI. All of
the usual customization and styling tricks for that UI thus apply to it.
Prototype API
New Classes
Metadata Authority Control class
Actual authority control is mainly a matter of the
_
...
MetadataAuthorityManager
...
_ broker class, which
interprets the DSpace configuration and reports the authority status
of a field. That's about all it does.
Choice Authority Manager
The choices framework consists of a
_
...
ChoiceAuthorityManager
...
_ class which serves as a broker
and accesses the configuration, and individual plugins implementing the
_
...
ChoiceAuthority
...
_ interface. Each plugin represents a
choice authority, which is a source of value options or choices.
See the prototype code for details.
Choice Authority plugin
These are the significant methods in
the
...
ChoiceAuthority
...
interface:
getMatches
Returns the set of choices matching, i.e. possibly relevant to, a
proposed (and maybe partial) metadata value.
The exact requirements and expectations of this method's implementation depend
on how the fields that call it are configured: if the suggest
UI presentation is employed, it will get called with partial values. If only
the lookup presentation is called, it will see complete values.
Also, the suggest mechanism is not usually capable of taking "paged"
result sets (i.e. where start is greater than 0).
Note that getMatches() is given a collection argument, which
contains the owning Collection of the Item (or SubmissionItem) for which
a metadata value is being assigned. This is intended to give the plugin
some context to adjust its criteria for assembling a set of choices.
For example, a personal-name choice authority may restrict its search to
members of a certain department if the indicated collection is only intended
for works by members of that department.
The collection is supplied as a database ID in order to save the
overhead and database access which would be required to create a DSpaceObject
instance – on the assumption that it is not always going to be used, and
in fact probably pretty rarely, and that the speed of a choice request is
very important in the interactive context, the expense of querying the
collection is deferred so it is only incurred by
those implementations which really need it.
Panelcode |
---|
public Choices getMatches(String text, int collection, int start, int limit, String locale); |
getBestMatch
Gets the single "best" match (if any) of a value in the authority
to the given user value. This is also expected to return a
meaningful "confidence" expressing the the circumstances of
this match, i.e. if it is ambiguous or not.
This call is typically used only in non-interactive metadata ingests
and other contexts where there is no opportunity for an
interactive agent to choose from among options.
Note that getBestMatch() is given a collection argument, just like
the collection context given to getMatches() – see above for details.
Panelcode |
---|
public Choices getBestMatch(String text, int collection, String locale); |
getLabel
Get the canonical, human-readable label (i.e. short descriptive text)
corresponding to the authority key of a value. This is only
called for fields defined as authority-controlled; a choice plugin that
is never used for authority-controlled fields does not need to
implement this.
Panelcode |
---|
public String getLabel(String key, String locale); |
API Changes
Changes to <tt>Item</tt>
Add methods that take authority-key and confidence arguments:
Panelcode |
---|
public void addMetadata(String schema, String element, String qualifier, String lang, String value, String authority, int confidence) public void addMetadata(String schema, String element, String qualifier, String lang, String[) values, String authorities(), int confidences(]) |
IMPORTANT NOTE on Backward Compatibility
The old
...
addMetadata
...
methods are still available, of course,
although they have a new failure mode. Whent the field is authority-controlled
they will call the field's configured
...
ChoiceAuthority
...
's
...
getBestMatch()
...
method to generate an authority key and confidence
value. If the field is configured to require an authority key,
an exception is thrown if
...
getBestMatch()
...
does't return one.
Changes to <tt>DCValue</tt>
Add fields:
Panelcode |
---|
public String authority; public int confidence; |
Changes to <tt>MetaDataValue</tt>
Panelcode |
---|
public String getAuthority() public void setAuthority(String value) public int getConfidence() public void setConfidence(int value) |
Changes to <tt>DIM</tt> XML "schema"
The DIM Field element gains two new attributes to represent the
authority and confidence (when available) of DC metadata values, e.g.
Panelcode |
---|
<dim:field schema="dc" element="contributor" qualifier="author" *authority="n79-21164" confidence="6"*> Mark Twain </dim:field> |
Changes to DRI Schema
The XMLUI's DRI schema has some attributes added to support
choice options and authority control in input fields and their values.
Since the pages in the Item Submission and Item Edit Metadata UIs effectively
round-trip all of the Item's DC metadata values through a Web page, it
is essential for the page's fields to preserve any authority key and
confidence values as well.
DRI <tt>params</tt> Element
The
...
params
...
subelement of
...
of field
...
gets some new
attributes:
...
authorityControlled
- Boolean, true if the field is authority-controlled.
{{authorityRequired} - Boolean, true if the field requires an authority key value.codeCode Block authorityRequired
choices
- Name of the metadata field whose choice configuration to use.Code Block choicesPresentation
- Type of choices presentation style to use, either <tt>suggest</tt> or <tt>select</tt>codechoicesClosed
- Boolean, true if choice is configured as closed, i.e. no values outside of presented choices are to be allowed.
The
...
value
...
subelement of
...
instance
...
and some
...
field
...
(e.g. text, textarea) elements gets some new
attributes:
Code Block type='authority'
- This new keyword value of the type attribute means the value is an authority-key value.codeconfidence
- The given confidence number applies to this element, which should also have a type of <tt>"authority"</tt>.
...
In the <tt>org.dspace.content.authority</tt> Package
...
Choice
- record class for a single choice valuecodeChoices
- record class for result of choices querycodeChoiceAuthority
- interface of choice authority pluginCode Block ChoiceAuthorityManager
- choices factory and accesscodeMetadataAuthorityManager
- metadata authority control factory
...
- New <tt>/choices/field</tt> URL added to sitemap to retrieve choices data as XML, for AJAX browser scripting.
- Implemented by
Code Block org.dspace.app.xmlui.cocoon.AJAXMenuGenerator
- Implemented by
- Show icons depicting confidence level of authority-controlled metadata values, in long Item view and MD edit pages.
- Option to show authority values in submission UI, for debugging, see
Code Block input.ds-authority-value
stanza in CSS file.
XMLUI Sample Implementation
Any Theme for a DSpace using the Choice or Authority Control features
must include the CSS declarations prototyped in the Reference theme.
The corresponding images must also be available in your theme.
All of the other elements (JavaScript, translations, etc) are implemented below
the theme layer.
Installing the Prototype
...
- Checkout from the the special "sandbox" on svn:
- Modify your <tt>dspace.cfg</tt> properties to enable choice and/or authority control on some fields. Unless you configure choice control, nothing will be any different. (See "Configuration" section above for details)
- Configure and build as usual:
- Be sure to choose the <tt>Reference</tt> theme, otherwise you'll have to copy over the choice/authority additions to your chosen theme (see above).
- After the fresh install, convert the database schema to 1.6 by running the SQL statements in the file:
Oracle: /dspace/etc/oracle/database_schema_15_16.sql orCode Block Postgres: /dspace/etc/database_schema_15_16.sqlCode Block - Start the webserver as usual.
...
Please use the Discussion page.</html>