Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As of release 1.7, DSpace includes support for supports running curation tasks, which are described in this section. DSpace 1.7 and subsequent distributions will bundle (include) several useful tasks, but the system also is designed to allow new tasks to be added between releases, both general purpose tasks that come from the community, and locally written and deployed tasks.

...

The goal of the curation system ('CS') is to provide a simple, extensible, way to manage routine content operations on a repository. These operations are known to CS as 'tasks', and they can operate on any DSpaceObject (i.e. subclasses of DSpaceObject) - which means Communities, Collections, and Items - viz. core data model objects. Tasks may elect to work on only one type of DSpace object - typically an Item - and in this case they may simply ignore other data types (tasks have the ability to 'skip' objects for any reason). The DSpace core distribution will provide a number of useful tasks, but the system is designed to encourage local extension - tasks can be written for any purpose, and placed in any java package. This gives DSpace sites the ability to customize the behavior of their repository without having to alter - and therefore manage synchronization with - the DSpace source code. What sorts of activities are appropriate for tasks?

...

  • apply a virus scan to item bitstreams (this will be our example below)
  • profile a collection based on format types - good for identifying format migrations
  • ensure a given set of metadata fields are present in every item, or even that they have particular values
  • call a network service to enhance/replace/normalize an items's metadata or content
  • ensure all item bitstreams are readable and their checksums agree with the ingest values

Since tasks have access to, and can modify, DSpace content, performing tasks is considered an administrative function to be performed only by knowledgeable collection editors, repository administrators, sysadmins, etc. No tasks are exposed in the public interfaces.

Activation

Wiki Markup
For CS to run a task, the code for the task must of course be included with other deployed code (to {{\[dspace\]/lib}}, WAR, etc) but it must also be declared and given a name. This is done via a configuration property in {{\[dspace\]/config/modules/curate.cfg}} as follows:

...

For each activated task, a 'class'='taskname' line is added. The taskname is used elsewhere to configure the use of the task, as will be seen below. Note that the curate.cfg configuration file, while in the config directory, is located under 'modules'. The intent is that tasks, as well as any configuration they require, will be optional or 'add-onons' to the basic system configuration. Adding or removing tasks has no impact on dspace.cfg.

Wiki Markup
For many tasks, this activation configuration is all that will be required to use it. But for others, the task needs specific configuration itself. A concrete example is described below, but here note that these task-specific configuration property files also reside in {{\[dspace\]/config/modules}}

...

This was mentioned above. This is returned to CS whenever a task is called. In addition to the task-assigned codes, there are The complete list of values:

Code Block
      -3 NOTASK - CS could not find the requested task
      -2 UNSET  - task did not return a status code because it has not yet run
      -1 ERROR - task could not be performed
       0 SUCCESS - task performed successfully
       1 FAIL - task performed, but failed
       2 SKIP - task not performed due to object not being eligible

Result String

The task may define a string indicating details of the outcome. This result is displayed, e.g. in the 'curation widget' described above:

...