Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This article describes the necessary steps for configuring and running the DuraCloud Mill. The DuraCloud Mill is a collection of applications that work in concert with DuraCloud to perform a variety of bulk processing tasks including audit log writing,  manifest maintenance, duplication and bit integrity checks. The While system has been designed to run in an auto-scalable environment such as AWS EC2,  but it  it can also be run as a suite of stand-alone applications on any machine with sufficient powercomputing resources.  This article only describes the various applications and how to configure them; it does not cover approaches to leveraging AWS autoscaling and supporting dev opts tools such as Puppet. If you are not yet familiar with the DuraCloud Mill please refer to the Duracloud Mill Overview article.  

Download, Build, Configure

...

Workman is responsible for reading tasks off a set of queues, delegating them to task processors, and then removing them once they have reached a completed state.  In the case of failures,  tasks are retried three times before they are sent to the dead letter queue.   A single instance of Workman can run multiple tasks in parallel.  How many tasks depends on the max_-workers setting in the mill-config.properties file.  It is also safe to run multiple instance instances of workman on multiple machines or on a single machine. a single machine as well as multiple.  We recommend running a single instance of workman on each machine instance, setting the max-workers setting in accordance with the available resources.   

  1.  Queue Names refer to the AWS SQS queue names defined in your account.  You must create and configure the following queues as defined in the queue section by replacing the brackets ([]) with names.

    Code Block
    #########
    # Queues
    #########
    queue.name.audit=[]
    queue.name.bit-integrity=[]
    queue.name.dup-high-priority=[]
    queue.name.dup-low-priority=[]
    queue.name.bit-error=[]
    queue.name.bit-report=[]
    queue.name.dead-letter=[]
  2.  Then for a given instance of workman, you must specify which queues will be consumed and the in which order they will be read.   In other words,  a given instance of workman can focus on specific kinds of tasks.  It can also decide which tasks have a higher priority.  In this way,  instances of workman can be configured to work on hardware configurations that are suitable to the load and kinds of tasks they will need to bear.  Make sure you use the above defined keys rather than the queue names themselves.

    Code Block
    ## A comma-separated prioritized list of task queue keys (ie do not use the 
    ## concrete aws queue names - use  queue.name.* keys) where the first is highest priority.
    ## The first items in the list have highest priority; the last the lowest.
    queue.task.ordered=[]
     
  3. As we mentioned before,  max-workers sets the number of task processing threads that can run simultaneously.

    Code Block
    # The max number of worker threads that can run at a time. The default value is 5. Setting with value will override the duracloud.maxWorkers if set in the configuration file.
    max-workers=[]
  4. The duplication policy manager writes policies to an S3 bucket.  Both the loopingduptaskproducer and workman use those policies for making decisions about duplication. 

    Code Block
    # The last portion of the name of the S3 bucket where duplication policies can be found.
    duplication-policy.bucket-suffix=duplication-policy-repo
    # The frequency in milliseconds between refreshes of duplication policies.
    duplication-policy.refresh-frequency=[]
  5. You can also set the workdir which defines where temp data will be written as well as notification.recipients.

    Code Block
    # Directory that will be used to temporarily store files as they are being processed.
    workdir=[]
    # A comma-separated list of email addresses
    notification.recipients=[]
    
    
  6. Once these settings are in place you can run workman by simply invoking the following java command: 

    Code Block
     java -Dlog.level=INFO -jar workman-{mill version here}.jar -c /path/to/mill-config.properties

...