Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The goal of the curation system ('CS') is to provide a simple, extensible, way to manage routine content operations on a repository. These operations are known to CS as 'tasks', and they can operate on any DSpaceObject (i.e. subclasses of DSpaceObject) - which means Communities, Collections, and Items - viz. core data model objects. Tasks may essentially elect to work on only one type of DSpace object - typically an Item - and in this case they may simply ignore other data types (tasks have the ability to 'skip' objects for any reason). The DSpace core distribution ought to will provide a number of useful tasks, but the system is designed to encourage local extension - tasks can be written for any purpose, and placed in any java package. This gives a DSpace site the ability to customize the behavior of their repository without having to alter - and therefore manage - the DSpace source code. What sorts of things activities are appropriate for tasks?

Some examples:

  • apply a virus scan to item bitstreams (this will be our example below)
  • profile a collection based on format types - good for identifying format migrations
  • ensure a given set of metadata fields are present in every item, or even that they have particular values
  • call a network service to enhance/replace/normalize an items's metadata or content
  • ensure all item bitstreams are readable and their checksums agree with the ingest values

A task can be arbitrary code, but the class implementing it must have 2 properties:

Activation

For CS to run a task, the code must of course be included with other deployed code (to dspace/lib, WAR, etc) but it must also be declared and given a name.
This is done via a configuration property in

Code Block
 [dspace]/config/modules/curate.cfg 

as followsFirst, it must provide a no argument constructor, so it can be loaded by the PluginManager. Thus, all tasks are 'named' plugins, meaning that each must be configured in dspace.cfg as:

Code Block
plugin.named.org.dspace.curate.CurationTask = \
org.dspace.curate.ProfileFormats = format-profile \
org.dspace.curate.RequiredMetadata = req-metadata \
org.dspace.curate.ClamScan = vscan

A task can be arbitrary code, but the class implementing it must have 2 properties:

First, it must provide a no argument constructor, so it can be loaded by the PluginManager. Thus, all tasks are 'named' plugins, meaning that each must be configured in dspace.cfg as:

The 'plugin name'(vscan, req-metadata, etc) is called the task name, and is used instead of the qualified class name wherever it is needed (on the command line, etc) - the CS always dereferences it.

...