Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
-t taskname: name of task to perform
-T filename: name of file containing list of tasknames
-e epersonID: (email address) will be superuser if unspecified
-i identifier: Id of object to curate. May be (1) a handle (2) a workflow Id or (3) 'all' to operate on the whole repository
-q queue: name of queue to process - -i and -q are mutually exclusive
-v emit verbose output
-r - emit reporting to standard out

As with other command-line tools, these invocations could be placed in a cron table and run on a fixed schedule, or run on demand by an administrator.

In the admin UI

Info
titleNot available for JSPUI

At this point in time, Curation Tasks cannot be run from the JSPUI Admin interface. However, users of the JSPUI can still run Curation Tasks from the Command Line or from Workflow.

In the XMLUI, there are several ways to execute configured Curation Tasks:

l limit: maximum number of objects in Context cache. If absent, unlimited objects may be added.
-s scope: declare a scope for database transactions. Scope must be: (1) 'open' (default value) (2) 'curation' or (3) 'object'
-v emit verbose output
-r - emit reporting to standard out

As with other command-line tools, these invocations could be placed in a cron table and run on a fixed schedule, or run on demand by an administrator.

In the admin UI

Info
titleNot available for JSPUI

At this point in time, Curation Tasks cannot be run from the JSPUI Admin interface. However, users of the JSPUI can still run Curation Tasks from the Command Line or from Workflow.

In the XMLUI, there are several ways to execute configured Curation Tasks:

  1. From the 'Curate' tab that appears on each 'Edit Community/Collection/Item' page: this tab allows an Administrator, Community Administrator or Collection Administrator to run a Curation Task on that particular Community, Collection or Item. When running a task on a Community or Collection, that task will also execute on all its child objects, unless the Task itself states otherwise (e.g. running a task on a Collection will also run it across all Items within that Collection).
    • NOTE: Community Administrators and Collection Administrators can only run Curation Tasks on the Community or Collection which they administer, along with any child objects of that Community or Collection. For example, a Collection Administrator can run a task on that specific Collection, or on any of the Items within that Collection.
  2. From the Administrator
  3. From the 'Curate' tab that appears on each 'Edit Community/Collection/Item' page: this tab allows an Administrator, Community Administrator or Collection Administrator to run a Curation Task on that particular Community, Collection or Item. When running a task on a Community or Collection, that task will also execute on all its child objects, unless the Task itself states otherwise (e.g. running a task on a Collection will also run it across all Items within that Collection).
    • NOTE: Community Administrators and Collection Administrators can only run Curation Tasks on the Community or Collection which they administer, along with any child objects of that Community or Collection. For example, a Collection Administrator can run a task on that specific Collection, or on any of the Items within that Collection.
  4. From the Administrator's 'Curation Tasks' page: This option is only available to DSpace Administrators, and appears in the Administrative side-menu. This page allows an Administrator to run a Curation Task across a single object, or all objects within the entire DSpace site.
    • Wiki Markup
      In order to run a task from this interface, you must enter in the handle for the DSpace object. To run a task site-wide, you can use the handle: {{\[your-handle-prefix\]/0}}

...

Code Block
Collection coll = (Collection)HandleManager.resolveToObject(context, "123456789/4");
Curator curator = new Curator();
curator.addTask("vscan").curate(coll);
System.out.println("Result: " + curator.getResult("vscan"));

would do approximately what the command line invocation did. the method 'curate' just performs all the tasks configured
(you can add multiple tasks to a curator).

Asynchronous (Deferred) Operation

Because some tasks may consume a fair amount of time, it may not be desirable to run them in an interactive context. CS provides a simple API and means to defer task execution, by a queuing system. Thus, using the previous example:

Code Block

Curator curator = new Curator();
     curator.addTask("vscan").queue(context, "monthly", "123456789/4");

would place a request on a named queue "monthly" to virus scan the collection. To read (and process) the queue, we could for example:

Code Block
[dspace]/bin/dspace curate -q monthly

use the command-line tool, but we could also read the queue programmatically. Any number of queues can be defined and used as needed.
In the administrative UI curation 'widget', there is the ability to both perform a task, but also place it on a queue for later processing.

Task Output and Reporting

Few assumptions are made by CS about what the 'outcome' of a task may be (if any) - it. could e.g. produce a report to a temporary file, it could modify DSpace content silently, etc But the CS runtime does provide a few pieces of information whenever a task is performed:

Status Code

This was mentioned above. This is returned to CS whenever a task is called. The complete list of values:

Code Block

-3 NOTASK - CS could not find the requested task
      -2 UNSET  - task did not return a status code because it has not yet run
      -1 ERROR - task could not be performed
       0 SUCCESS - task performed successfully
       1 FAIL - task performed, but failed
       2 SKIP - task not performed due to object not being eligible

In the administrative UI, this code is translated into the word or phrase configured by the ui.statusmessages property (discussed above) for display.

Result String

The task may define a string indicating details of the outcome. This result is displayed, in the 'curation widget' described above:

Code Block

"Virus 12312 detected on Bitstream 4 of 1234567789/3"

CS does not interpret or assign result strings, the task does it. A task may not assign a result, but the 'best practice' for tasks is to assign one whenever possible.

Reporting Stream

For very fine-grained information, a task may write to a reporting stream. This stream is sent to standard out, so is only available when running a task from the command line. Unlike the result string, there is no limit to the amount of data that may be pushed to this stream.

The status code, and the result string are accessed (or set) by methods on the Curation object:

...

 " + curator.getResult("vscan"));

would do approximately what the command line invocation did. the method 'curate' just performs all the tasks configured
(you can add multiple tasks to a curator).

Asynchronous (Deferred) Operation

Because some tasks may consume a fair amount of time, it may not be desirable to run them in an interactive context. CS provides a simple API and means to defer task execution, by a queuing system. Thus, using the previous example:

Code Block

Curator curator = new Curator();
     curator.addTask("vscan").queue(context, "monthly", "123456789/4");

would place a request on a named queue "monthly" to virus scan the collection. To read (and process) the queue, we could for example:

Code Block
[dspace]/bin/dspace curate -q monthly

use the command-line tool, but we could also read the queue programmatically. Any number of queues can be defined and used as needed.
In the administrative UI curation 'widget', there is the ability to both perform a task, but also place it on a queue for later processing.

Task Output and Reporting

Few assumptions are made by CS about what the 'outcome' of a task may be (if any) - it. could e.g. produce a report to a temporary file, it could modify DSpace content silently, etc But the CS runtime does provide a few pieces of information whenever a task is performed:

Status Code

This was mentioned above. This is returned to CS whenever a task is called. The complete list of values:

Code Block

  -3 NOTASK - CS could not find the requested task
  -2 UNSET  - task did not return a status code because it has not yet run
  -1 ERROR - task could not be performed
   0 SUCCESS - task performed successfully
   1 FAIL - task performed, but failed
   2 SKIP - task not performed due to object not being eligible

In the administrative UI, this code is translated into the word or phrase configured by the ui.statusmessages property (discussed above) for display.

Result String

The task may define a string indicating details of the outcome. This result is displayed, in the 'curation widget' described above:

Code Block

"Virus 12312 detected on Bitstream 4 of 1234567789/3"

CS does not interpret or assign result strings, the task does it. A task may not assign a result, but the 'best practice' for tasks is to assign one whenever possible.

Reporting Stream

For very fine-grained information, a task may write to a reporting stream. This stream is sent to standard out, so is only available when running a task from the command line. Unlike the result string, there is no limit to the amount of data that may be pushed to this stream.

The status code, and the result string are accessed (or set) by methods on the Curation object:

Code Block

Curator curator = new Curator();
     curator.addTask("vscan").curate(coll);
     int status = curator.getStatus("vscan");
     String result - curator.getResult("vscan");

Task Properties

Wiki Markup
DSpace 1.8 introduces a new 'idiom' for tasks that require configuration data. It is available to any task whose implementation extends _AbstractCurationTask_, but is completely optional. There are a number of problems that task properties are designed to solve, but to make the discussion concrete we will start with a particular one: the problem of hard-coded configuration file names. A task that relies on configuration data will typically encode a fixed reference to a configuration file name. For example, the virus scan task reads a file called 'clamav.cfg', which lives in {{\[dspace\]/config/modules}}. And thus in the implementation one would find:

Code Block

host = ConfigurationManager.getProperty("clamav", "service.host");

and similar. But tasks are supposed to be written by anyone in the community and shared around (without prior coordination), so if another task uses the same configuration file name, there is a name collision here that can't be easily fixed, since the reference is hard-coded in each task. In this case, if we wanted to use both at a given site, we would have to alter the source of one of them - which introduces needless code localization and maintenance.

Task properties gives us a simple solution. Here is how it works: suppose that both colliding tasks instead use this method provided by AbstractCurationTask in their task implementation code (e.g. in virus scanner):

Code Block

host = taskProperty("service.host");

Note that there is no name of the configuration file even mentioned, just the property name whose value we want. At runtime, the curation system resolves this call to a configuration file, and it uses the name the task has been configured as as the name of the config file. So, for example, if both were installed (in curate.cfg) as:

Code Block

org.dspace.ctask.general.ClamAv = vscan,
org.community.ctask.ConflictTask = virusscan,
....

Wiki Markup
then 'taskProperty()' will resolve to {{\[dspace\]/config/modules/vscan.cfg}} when called from ClamAv task, but {{\[dspace\]/config/modules/virusscan.cfg}} when called from ConflictTask's code. Note that the 'vscan' etc are locally assigned names, so we can always prevent the 'collisions'mentioned, and we make the tasks much more portable, since we remove the 'hard-coding' of config names.

The entire 'API' for task properties is:

Code Block

public String taskProperty(String name);
public int taskIntProperty(String name, int defaultValue);
public long taskLongProperty(String name, long defaultValue);
public boolean taskBooleanProperty(String name, boolean default);

Wiki Markup
Another use of task properties is to support multiple task profiles. Suppose we have a task that we want to operate in one of two modes. A good example would be a mediafilter task that produces a thumbnail. We can either create one if it doesn't exist, or run with '-force' which will create one regardless. Suppose this behavior was controlled by a property in a config file. If we configured the task as 'thumbnail', then we would have in {{\[dspace\]/config/modules/thumbnail.cfg}}:

Code Block

...other properties...
thumbnail.maxheight = 80
thumbnail.maxwidth = 80
forceupdate=false

Then, following the pattern above, the thumbnail generating task code would look like:

Code Block

if (taskBooleanProperty("forceupdate")) {
// do something
}

But an obvious use-case would be to want to run force mode and non-force mode from the admin UI on different occasions. To do this, one would have to stop Tomcat, change the property value in the config file, and restart, etc However, we can use task properties to elegantly rescue us here. All we need to do is go into the config/modules directory, and create a new file called: thumbnail.force.cfg. In this file, we put only one property:

Code Block

forceupdate=true

Then we add a new task (really just a new name, no new code) in curate.cfg:

Code Block

org.dspace.ctask.general.ThumbnailTask = thumbnail,
org.dspace.ctask.general.ThumbnailTask = thumbnail.force

Consider what happens: when we perform the task 'thumbnail' (using taskProperties), it reads the config file thumbnail.cfg and operates in 'non-force' profile (since the value is false), but when we run the task 'thumbnail.force' the curation system first reads thumbnail.cfg, then reads thumbnail.force.cfg which overrides the value of the 'forceupdate' property. Notice that we did all this via local configuration - we have not had to touch the source code at all to obtain as many 'profiles' as we would like.

Task Annotations

CS looks for, and will use, certain java annotations in the task Class definition that can help it invoke tasks more intelligently. An example may explain best. Since tasks operate on DSOs that can either be simple (Items) or containers (Collections, and Communities), there is a fundamental problem or ambiguity in how a task is invoked: if the DSO is a collection, should the CS invoke the task on each member of the collection, or does the task 'know' how to do that itself? The decision is made by looking for the @Distributive annotation: if present, CS assumes that the task will manage the details, otherwise CS will walk the collection, and invoke the task on each member. The java class would be defined:

...

Code Block
10 (K) Portable Network Graphics
     5  (S) Plain Text

where the left column is the count of bitstreams of the named format and the letter in parentheses is an abbreviation of the repository-assigned support level for that format:

Code Block
U  Unsupported
    K  Known
    S  Supported

The profiler will operate on any DSpace object. If the object is an item, then only that item's bitstreams are profiled; if a collection, all the bitstreams of all the items; if a community, all the items of all the collections of the community.

...