Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
@Records({
   @Record(type="PREMIS", value="Replication"),
   @Record(type="LOC" value="duplication")
})

Resource Management

One of the core design objectives of CS was to make tasks as simple to implement as possible: in practice this meant keeping the API 'footprint' (number of methods that a task has to code) very small. In fact, it really only consists of 2 methods:

Code Block

void init(Curator curator, String taskId) throws IOException;

int perform(DSpaceObject dso) throws IOException;

int perform(Context ctx, String id) throws IOException;

where the third method can usually be converted into the second. One consequence of this is a lack of what one would consider full lifecycle semantics. That is, there is no method by which a task could 'clean itself up' after use. This can entail a few gyrations - or at any rate a certain task design discipline - in certain circumstances. Let us take a concrete example: a task that needs to write some data to a stream for each object it performs on. The simplest apparent way to code this is:

Code Block

public class StreamTaskTake1 implements CurationTask
{
   private OutputStream out;

   public void init(Curator curator, String taskId) throws IOException
   {
       out = new FileOutputStream("somewhere");
   }

   public int perform(DSpaceObject dso) throws IOException
   {
       .....
       out.write(dso.getHandle().getBytes());
       ....
    }
}

but of course this isn't very satisfactory, since the task never closes the stream it opened. The task will never have a way of knowing when it is last called, so there isn't an obvious way around this. (There are in fact several ways - e.g. the task can annotate itself as @Distributive and have complete control over how it is called, but this can add substantial complexity). So we are usually led to a solution like this:

Code Block

public class StreamTaskTake2 implements CurationTask
{
   private OutputStream out;

   public void init(Curator curator, String taskId) throws IOException
   {
   }

   public int perform(DSpaceObject dso) throws IOException
   {
       .....
       out = new FileOutputStream("somewhere");
       out.write(dso.getHandle().getBytes());
       out.close();
       ....
    }
}

This version is formally correct, and in fact exhibits the quite desirable trait of not holding a file descriptor it isn't using, but we might chafe at the thought that we are doing fairly inefficient IO if this task is invoked on a collection of 1000 items. Thus the idea of curator resource management: suppose we could simply ask the curation system to deal with the problem?

Code Block

public class StreamTaskTake3 implements CurationTask
{
   private OutputStream out;

   public void init(Curator curator, String taskId) throws IOException
   {
       out = new FileOutputStream("somewhere");
       // let the curator worry about this..
       curator.enrollResource(out, "close");
   }

   public int perform(DSpaceObject dso) throws IOException
   {
       .....
       out.write(dso.getHandle().getBytes());
       ....
    }
}

That is, the enrollResource method asks the CS to ensure that when it has finished it's work, it should call 'out.close()' on the stream. The "close" argument is called the policy, and it is the job of CS to enforce the policy. Currently, we have only looked at 'close' and 'flush' as policies, but it would not be difficult to imagine others.