Unsupported Release

This documentation relates to an old, unsupported version of DSpace, version 1.7.x. Looking for another version? See all documentation.

As of January 2014, the DSpace 1.7.x platform is no longer supported. We recommend upgrading to a more recent version of DSpace.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Curation System

As of release 1.7, DSpace includes support for running curation tasks, which are described below. The main distribution will include several useful tasks, but the system also is designed to allow locally written and deployed tasks that follow the API described below.

Tasks

The goal of the curation system ('CS') is to provide a simple, extensible, way to manage routine content operations on a repository. These operations are known to CS as 'tasks', and they can operate on any DSpaceObject (i.e. subclasses of DSpaceObject) - which means Communities, Collections, and Items - viz. core data model objects. Tasks may elect to work on only one type of DSpace object - typically an Item - and in this case they may simply ignore other data types (tasks have the ability to 'skip' objects for any reason). The DSpace core distribution will provide a number of useful tasks, but the system is designed to encourage local extension - tasks can be written for any purpose, and placed in any java package. This gives a DSpace site the ability to customize the behavior of their repository without having to alter - and therefore manage - the DSpace source code. What sorts of activities are appropriate for tasks?

Some examples:

  • apply a virus scan to item bitstreams (this will be our example below)
  • profile a collection based on format types - good for identifying format migrations
  • ensure a given set of metadata fields are present in every item, or even that they have particular values
  • call a network service to enhance/replace/normalize an items's metadata or content
  • ensure all item bitstreams are readable and their checksums agree with the ingest values

Activation

For CS to run a task, the code must of course be included with other deployed code (to dspace/lib, WAR, etc) but it must also be declared and given a name.
This is done via a configuration property in

 [dspace]/config/modules/curate.cfg 

as follows:

plugin.named.org.dspace.curate.CurationTask = \
org.dspace.curate.ProfileFormats = format-profile \
org.dspace.curate.RequiredMetadata = req-metadata \
org.dspace.curate.ClamScan = vscan

A task can be arbitrary code, but the class implementing it must have 2 properties:

First, it must provide a no argument constructor, so it can be loaded by the PluginManager. Thus, all tasks are 'named' plugins, meaning that each must be configured in dspace.cfg as:

The 'plugin name'(vscan, req-metadata, etc) is called the task name, and is used instead of the qualified class name wherever it is needed (on the command line, etc) - the CS always dereferences it.

Second, it must implement the interface 'org.dspace.curate.CurationTask'

The CurationTask interface is almost a 'tagging' interface, and only requires a few very high-level methods be implemented. The most significant is:

 int perform(DSpaceObject dso); 

The return value should be a code describing one of 4 conditions:

  • 0 : SUCCESS the task completed successfully
  • 1 : FAIL the task failed (it is up to the task to decide what 'counts' as failure - an example might be that the virus scan finds an infected file)
  • 2 : SKIPPED the task could not be performed on the object, perhaps because it was not applicable
  • -1 : ERROR the task could not be completed due to an error

If a task extends the AbstractCurationTask class, that is the only method it needs to define.

Task Invocation

Tasks are invoked using CS framework classes that manage a few details (to be described below), and this invocation can occur wherever needed, but CS offers great versatility 'out of the box':

On the command line

A simple tool 'CurationCli' provides access to CS via command line. For example, to perform a virus check on collection '4':

 [dspace]/bin/dspace curate -t vscan -i 123456789/4 

As with other command-line tools, these invocations could be placed in a cron table and run on a fixed schedule, or run on demand by an administrator.

In the admin UI

In the XMLUI, there is a 'Curate' tab (appearing within the 'Edit Community/Collection/Item') that exposes a drop-down list of configured tasks, with a button to 'perform' the task, or queue it for later operation (see section below). You may filter out some of the defined tasks (not appropriate for UI use), by means of a configuration property. This property also permits you to assign to the task a 'prettier' name than the PluginManager task name. The property resides in dspace/config/modules/curate.cfg:

ui.tasknames = \
     profileformats = Profile Bitstream Formats, \
     requiredmetadata = Check for Required Metadata

In workflow

CS provides the ability to attach any number of tasks to standard DSpace workflows. Using a configuration file (workflow-curation.xml), you can declaratively (without coding) wire tasks to any step in a workflow. An example:

<taskset name="cautious">
  <flowstep name="step1">
    <task name="vscan">
      <workflow>reject</workflow>
      <notify on="fail">$flowgroup</notify>
      <notify on="fail">$colladmin</notify>
      <notify on="error">$siteadmin</notify>
    </task>
  </flowstep>
</taskset>

This markup would cause the virus scan to occur during step one of workflow, and automatically reject any submissions with infected files. It would further notify (via email) both the reviewers (step 1 group), and the collection administrators, if either of these are defined. If it could not perform the scan, the site administrator would be notified.

The notifications use the same procedures that other workflow notifications do - namely email. There is a new email template defined for curation task use (in dspace/config/emails): 'flowtask_notify'. This may be language-localized or otherwise modified like any other email template.

Like configurable submission, you can assign these task rules per collection, as well as having a default for any collection.

In arbitrary user code

If these pre-defined ways are not sufficient, you can of course manage curation directly in your code. You would use the CS helper classes. For example:

Collection coll = (Collection)HandleManager.resolveToObject(context, "123456789/4");
Curator curator = new Curator();
curator.addTask("vscan").curate(coll);
System.out.println("Result: " + curator.getResult("vscan"));

would do approximately what the command line invocation did. the method 'curate' just performs all the tasks configured
(you can add multiple tasks to a curator).

  • No labels