Archived / Obsolete Documentation

Documentation in this space is no longer accurate.
Looking for official DSpace documentation? See all documentation

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

New Features for the Curation System

Introduced in DSpace 1.7, and expanded in 1.8, the Curation System (CS) is still a comparatively new denizen in the DSpace ecosystem. As more tasks and 'suites' are produced, we are learning a lot about what additional functionality the framework could offer to support more powerful, flexible, and easily implemented tasks. This page is intended to be a place to collect these insights, as well as designs that address these needs. Many new features are already being developed, and we welcome participation in their evolution.

Object Selectors

In CS, the unit of curation is a DSpaceObject (which may be an Item, Collection, or Community). Thus the API offers these basic methods (on the Curator class):

public void curate(DSpaceObject object) throws IOException;

public void curate(Context c, String id) throws IOException;

A task may elect to restrict its scope of operation to a particular type or subset of objects (typically, only items, not containers), and can thus apply filters in business logic code to the objects it is given, but often we may wish to perform a given task on a set of objects that do not correspond to any natural container, so filtering will be of no help. For example, we may wish to perform a task on all recently installed items (whatever the collection). We may do this, of course, by writing custom code that pulls the necessary items, then feeds them one-by-one to a curator, but our code is not very portable/repurposable. We could not, e.g., easily use the same code in a command-line context and a UI context, as we have come to expect with CS.

This is the primary motivation for a new feature of the curation API known as object selectors. 'ObjectSelector' is a new interface (which essentially just exposes a DSpaceObject Iterator), that is directly supported by the curation API:

public void curate(ObjectSelector selector) throws IOException;

The curator will perform the configured tasks on all the DSpaceObjects delivered by the selector, and the selector can deliver any set of objects it wishes. As an interface, CS users may write and deploy their own custom selector implementations, but we propose to offer a few general-purpose selector implementations that will be bundled with the curation system. Currently these are:

SearchSelector

This selector invokes the DSpace native (Lucene) search APIs to obtain sets of objects. In this way, one can easily perform curation tasks on any set of search results. For ease of reuse, the search query string can be stored in a configuration file, and each such configuration can be given a different name. This technique, known as 'named selectors', allows for easy integration in other CS tools. For example (in the command-line tool via the DSpace launcher):

[dspace]/bin/dspace curate -o nanotechnology -t textextract

The argument to the '-o' (*o*bjectselector) is the name of a selector, which we can imagine is a search for all the items whose title contains 'nanotechnology'.

It should be noted that SearchSelector can also be used for 'non-canned' searches: we could expose a search box in a web page, have the user type in a search string and configure a search selector to use this 'live' query.

QuerySelector

This selector queries the database to obtain its objects. In essence, the selector transforms a very simplified user-supplied query string into the SQL necessary to perform the database query. An example can illustrate:

in_archive = '1' AND last_modified > ${today - 7} AND dc.contributor.author = 'Jones'

This query would retrieve all items authored by Jones installed within the last week. The actual SQL is more complex, since joins with the metadata tables are required. For the curious, the syntax of the query language is given below (in Extended Backus-Naur Form)

(* Query syntax EBNF *)
  query = expr , { "AND" , expr } ;
  expr = field name | metadata name , oper , value ;
  field name = characters , { "_" , characters } ;
  metadata name = characters , "." , characters , [ "." , characters ] ;
  oper = "=" | "<>" | ">" | "<" | ">=" | "<=" | "BETWEEN" | "LIKE" | "IN" ;
  value = literal | variable ;
  literal = "'" , characters , { whitespace , characters } , "'" ;
  variable = "${" , varname , [ "+" | "-" , number ] , "}" ; 
  varname = "today" | handle ;
(* end syntax EBNF *)
  • No labels