Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Editing only

...

  1.  Queue Names refer to the AWS SQS queue names defined in your account.  You must create and configure the following queues as defined in the queue section by replacing the brackets ([]) with names.

    Code Block
    #########
    # Queues
    #########
    queue.name.audit=[]
    queue.name.bit-integrity=[]
    queue.name.dup-high-priority=[]
    queue.name.dup-low-priority=[]
    queue.name.bit-error=[]
    queue.name.bit-report=[]
    queue.name.dead-letter=[]
  2.  Then for  For a given instance of workman , you  you must specify which queues will be consumed and the order in which order they will be read.   In other words,  a given instance of workman can focus on specific kinds of tasks.  It can also decide which tasks have a higher priority.  In this way,  instances of workman can be configured to work on hardware configurations that are suitable to the load and kinds of tasks they will need to bear.  Make sure you use the above defined keys rather than the queue names themselves.

    Code Block
    ## A comma-separated prioritized list of task queue keys (ie do not use the 
    ## concrete aws queue names - use  queue.name.* keys) where the first is highest priority.
    ## The first items in the list have highest priority; the last the lowest.
    queue.task.ordered=[]
     
  3. As we mentioned before,  max-workers sets the number of task processing threads that can run simultaneously.

    Code Block
    # The max number of worker threads that can run at a time. The default value is 5. 
    max-workers=[]
  4. The duplication policy manager writes policies to an S3 bucket.  Both the loopingduptaskproducer and workman use those policies for making decisions about duplication. 

    Code Block
    # The last portion of the name of the S3 bucket where duplication policies can be found.
    duplication-policy.bucket-suffix=duplication-policy-repo
    # The frequency in milliseconds between refreshes of duplication policies.
    duplication-policy.refresh-frequency=[]
  5. You can also set the workdir which defines where temp data will be written as well as notification.recipients.

    Code Block
    # Directory that will be used to temporarily store files as they are being processed.
    workdir=[]
    # A comma-separated list of email addresses
    notification.recipients=[]
    
    
  6. Once these settings are in place you can run workman by simply invoking the following java command: 

    Code Block
     java -Dlog.level=INFO -jar workman-{mill version here}.jar -c /path/to/mill-config.properties

...

Once you have an instance of workman running you can perform an explicit duplication run.  Remember  The spaces that  spaces that have been configured with duplication policies (see the Mill Overview for details) will generate duplication events when the audit tasks associated with them are processed.  But if  If you add a new duplication policy to a new space that already has content items,  you'll need to perform a duplication run to ensure that those new items get duplicated. The loopingduptaskproducer fulfills this function. Based on the set of duplication policies, it will generate duplication tasks for themall matching spaces.  It will keep track of which accounts, spaces and items have been processed in a given run so it does not need to run in daemon mode. It will run until it has reached the max number of allowable items on the queue and then it will exit. The next time it is run, it will pick up where it left off. You may want to dial down the max queue size in the event that you have so many items and so little computing power to process them with that you may exceed the maximum life of an SQS message (which happens to be 14 days).  It should also be noted here that items are added roughly 1000 one thousand at a time for each space in a round-robin fashion to ensure that all spaces are processed in a timely way - especially .   This strategy ensures that small spaces that are flanked by large spaces are processed quickly.   It is also important that only one instance of loopingduptaskproducer is running at any moment in time.   So two   Two settings to be concerned with when it comes to the looping dup task producer: 

...

Configuring and Running Bit Integrity 

Bit integrity runs works similarly to the loopingduptaskproducerduplication runs.  It  loopingbittaskproducer has similar settings as those mentioned above as well as two others, looping.bit.inclusion-list-file and looping.bit.exclusion-list-file. These two config files let you be more surgical in what you decide to include and exclude from your bit integrity run and function similarly to the duplication policies. The important thing to note here is that if there are not entries in the inclusion list all accounts, stores, and spaces are included.   It is also important that only one instance of loopingbittaskproducer is running at any moment in time.  

...