Warning |
---|
This page may describe proposed implementation or an implementation for an older version of DSpace. For official information, you should refer to the official documentation for your particular DSpace version: DSpace 1.7.x: Curation System DSpace 1.8.x: Curation System DSpace 3.x: Curation System |
Curation System for DSpace 1.7
...
- Virus Scanning: Virus Scan Curation Task
- Content Replication: ReplicationTaskSuite
Tasks
...
First, it must provide a no-arg constructor, so it can be loaded by the PluginManager. Thus, all tasks are 'named' plugins, meaning that each must be configured in dspace.cfg as:
Code Block |
---|
plugin.named.org.dspace.curate.CurationTask = \
org.dspace.curate.ProfileFormats = format-profile \
org.dspace.curate.RequiredMetadata = req-metadata \
org.dspace.ctask.replicate.Audit = audit \
org.dspace.ctask.replicate.Estimate = estimate \
org.dspace.ctask.replicate.Generate = generate \
org.dspace.ctask.integrity.Checksum = checksum \
org.dspace.ctask.integrity.ClamScan = vscan
|
...
The CurationTask interface is almost a 'tagging' interface, and only requires a few very high-level methods be implemented. The most significant is:
Code Block | ||
---|---|---|
| ||
int perform(DSpaceObject dso); |
The return value should be a code describing one of 4 conditions:
...
Tasks are invoked using CS framework classes that manage a few details (to be described below), and this invocation can occur wherever needed, but CS offers great versatility '"out of the box'":
On the command line
A simple tool '"CurationCli' " provides access to CS via command line. For example, to perform a virus check on collection '"4'":
Code Block |
---|
.[dspace]/bin/dspace curate -t vscan -i 123456789/4 |
or
Code Block |
---|
./[dspace]/bin/dspace dsrun org.dspace.curate.CurationCli -t vscan -i 123456789/4 |
...
CS provides the ability to attach any number of tasks to standard DSpace workflows. Using a configuration file (workflow-curation.xml), you can declaratively (without coding) wire tasks to any step in a workflow. An example:
Code Block | ||
---|---|---|
| ||
<taskset name="cautious">
<flowstep name="step1">
<task name="vscan">
<workflow>reject</workflow>
<notify on="fail">$flowgroup</notify>
<notify on="fail">$colladmin</notify>
<notify on="error">$siteadmin</notify>
</task>
</flowstep>
</taskset>
|
...
If these pre-defined ways are not sufficient, you can of course manage curation directly in your code. You would use the CS helper classes. For example:
Code Block | ||
---|---|---|
| ||
Collection coll = (Collection)HandleManager.resolveToObject(context, "123456789/4");
Curator curator = new Curator();
curator.addTask("vscan").curate(coll);
System.out.println("Result: " + curator.getResult("vscan"));
|
...
Because some tasks may consume a fair amount of time, it may not be desirable to run them in an interactive context. CS provides a simple API and means to defer task execution, by a queuing system. Thus, using the previous example:
Code Block | ||
---|---|---|
| ||
Curator curator = new Curator(); curator.addTask("vscan").queue(context, "monthly", "123456789/4"); |
would place a request on a named queue "monthly" to virus scan the collection. To read (and process) the queue, we could for example:
Code Block |
---|
./[dspace]/bin/dspace dsrun org.dspace.curate.CurationCli -q monthly |
...
This was mentioned above. This is returned to CS whenever a task is called. In addition to the task-assigned codes, there are values:
Code Block |
---|
NOTASK - CS could not find the requested task
UNSET - task did not return a status code because it has not yet run
|
...
The task may define a string indicating details of the outcome. This result is displayed, e.g. in the 'curation widget' described above:
Code Block |
---|
"Virus 12312 detected on Bitstream 4 of 1234567789/3"
|
...
All 3 are accessed (or set) by methods on the Curation object:
Code Block | ||
---|---|---|
| ||
Curator curator = new Curator(); curator.addTask("vscan").curate(coll); int status = curator.getStatus("vscan"); |
...
CS looks for, and will use, certain java annotations in the task Class definition that can help it invoke tasks more intelligently. An example may explain best. Since tasks operate on DSOs that can either be simple (Items) or containers (Collections, and Communities), there is a fundamental problem or ambiguity in how a task is invoked: if the DSO is a collection, should the CS invoke the task on each member of the collection, or does the task 'know' how to do that itself? The decision is made by looking for the @Distributive annotation: if present, CS assumes that the task will manage the details, otherwise CS will walk the collection, and invoke the task on each member. The java class would be defined:
Code Block | ||
---|---|---|
| ||
@Distributive
public class MyTask implements CurationTask
|
A related issue concerns how non-distributive tasks report their status and results: the status will normally reflect only the last invocation of the task in the container, so important outcomes could be lost. If a task declares itself @Suspendable, however, the CS will cease processing when it encounters a FAIL status. When used in the UI, for example, this would mean that if our virus scan is running over a collection, it would stop and return status (and result) to the scene on the first infected item it encounters. You can even tune @Supendable tasks more precisely by annotating what invocations you want to suspend on. For example:
Code Block | ||
---|---|---|
| ||
@Suspendable(invoked=Curator.Invoked.INTERACTIVE)
public class MyTask implements CurationTask
|
...