Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

The filter() method is designed to accept a set of TaskQueueEntries - which is what the TaskQueue 'dequeue()' method returns - and return a (possibly) modified set retrievable through an iterator. The iterator is important (as opposed to just a new set), since it allows (but does not require) the filter to impose an order on the entries. Filters will be applied when 'CurationCli' is invoked (we can add a new, optional, '-f filter' command-line switch) on a particular queue, so flexibility is secured by the ability to set different (or no) filters on different queues. It may be possible to 'chain' filters, but these use-cases would need further definition.

Programs

A common need is to coordinate the activities of multiple tasks against particular object sets: we may wish to ensure one task is performed before another, or only conditionally performed, possibly based on the 'outcome' of another task. CS currently has no ability to specify or enforce these these constraints: in fact it explicitly disavows this. In this situation:

Code Block
Curator curator = new Curator();
curator.addTask("task1");
curator.addTask("task2");
curator.curate(myDso);

the curator makes no promises that 'task1' will run before 'task2' - it could in fact be reversed. Nor can a task have any way of 'discovering' whether another task has run, so coordination can't be managed in the task logic itself. There are sound reasons why simple ordering is not supported: there are too many 'contingencies' that simple ordering cannot cope with. For example, suppose that in the above case 'task1' has an error and never properly ran - then task2's assumptions would be mistaken.

A more full-featured and robust mechanism than simple ordering is needed: thus the proposal to add task 'programs'. A program is a set of instructions about how and whether to run sets of tasks. The CS will be responsible for 'compiling' and running these programs, and a 'program' will have the exact same semantics as an atomic task. Namely:

  • It will return a status code with the same value set as tasks
  • It will optionally return a 'result' string
  • It will have a locally-bound logical name
  • It will be possible to invoke a program wherever a task can be - in admin UI, workflow, batch, etc

What would a task program look like - i.e. what is the program syntax, etc.?  Here is a straw-man example:

Code Block
# Task Program Example
# MIT Libraries - January 2013
first-task
if not @SUCCESS
  report "problem out of the gate"
  return @ERROR:"first-task did not succeed"
end
second-task
if @FAIL
   cleanup-task
elif @ERROR
   report "error on second task"
elif @SKIP
   another-task
   if @SUCCESS
      return cleanup-task
   end
else
   cleanup-task
end

 

Object Selectors

In CS, the unit of curation is a DSpaceObject (which may be an Item, Collection, or Community). Thus the API offers these basic methods (on the Curator class):

...