This documentation space is deprecated. Please make all updates to DuraCloud documentation on the live DuraCloud documentation space.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction

The DuraCloud application provides a set of services which can be deployed and used for a variety of purposes, primarily to process the content which has been loaded into DuraCloud storage. The following list of services describes how each service is expected to be used and the options available for tailoring the service to your needs.

Note that all services currently have a "Location" configuration option which is intended to allow for the deployment of services at varying locations. At the moment, however, services can only be deployed on the primary service instance. As this configuration option is consistent across all services it will not be included in the listing for each service.

Also note that the current way all DuraCloud services are configured, they will not auto-restart if they fail during processing. If you notice a failed job state, simply redeploy the service. Automatic service recovery is on the roadmap for DuraCloud development in the near future and will be made available as soon as possible.

Duplicate on Upload

Description:

The Duplicate on Upload service provides a way to ensure that the content added to DuraCloud is stored with at least two storage providers. The Duplicate on Upload service performs on-ingest duplication of content. This means that once the Duplicate on Upload service is deployed, it watches for all content that is added to your DuraCloud account, determines if it should be copied to another DuraCloud store, and if so, performs the copy. All content that is copied will be placed in an identically named space in the secondary storage location.

Configuration Options:

  1. Duplicate from this store: The primary storage location which DuraCloud will monitor for file additions. When files are added to this store, they will be copied to the secondary store.
  2. Duplication to this store: The secondary store where content will be copied after it has been added to the primary store.

Duplicate on Demand

Description:

The Duplicate On Demand service provides a simple way to duplicate content from one space to another. This service is primarily focused on allowing the duplication of content from the primary storage provider to a secondary provider. To begin, a source space is chosen, along with a store and space to which content will be duplicated. The service then performs a copy of all content and metadata in the source space to the duplication space, creating the space if necessary. When the service has completed its work, a results file will be stored in the chosen space and a set of files (primarily logs) created as part of the process will be stored in the work space.

Configuration Options:

  1. Source Space: DuraCloud space where source files can be found
  2. Replicate to this store: DuraCloud store to which content will be copied
  3. Replicate to this space: DuraCloud space where content will be copied
  4. Store results file in this space: DuraCloud space (on the primary store) where results file will be placed
  5. Standard vs. Advanced Bulk configuration
    1. Standard allows the user to choose "Optimize for cost" or "Optimize for speed"
    2. Advanced allows the user to configure the number and type of servers to use
      1. Number of Server Instances: The number of servers to use to perform the replication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Image Server

Description:

The Image Server provides a viewer for image files through use of the Djatoka image server. While this service is geared towards serving JPEG 2000 images, it supports multiple image file types by converting them to JPEG 2000 format on the fly.

Note that the current implementation of this service requires that spaces be set to OPEN in order to use the viewer to view image files.

Configuration Options:

None

Media Streamer

Description:

The Media Streamer provides streaming capabilities for video and audio files. The service takes advantage of Amazon Cloudfront streaming, so files to be streamed must be within a space on an Amazon provider. Also, all media to be streamed by this service needs to be within a single space.

Amazon Cloudfront streaming uses the Flash Media Server to host streaming files over RTMP. File formats supported include MP3, MP4 and FLV among others. For a full listing of supported file types see the Flash Media Server documentation.

Configuration Options:

  1. Source Media Space: The DuraCloud space where the source video and audio files to be streamed are stored. The Media Streamer service attempts to stream all files in this space.
  2. Viewer Space: A DuraCloud space where example viewer files will be stored. After the service has started, this space will include a playlist including all items in the source media space as well as example html and javascript files which can be used to display a viewer.

Bit Integrity Checker

Description:

The Bit Integrity Checker provides the ability to verify that the content held within DuraCloud has maintained its bit integrity. There are five modes of operation.
Modes:

  1. Verify the bit integrity of a list of items
  2. Verify the bit integrity of an entire space
  3. Generate bit integrity information for a list of items
  4. Generate bit integrity information for an entire space
  5. Compare two different bit integrity reports

Configuration Options:

  1. Mode 1
    1. "Get integrity information from..." : defines the source of the MD5: pre-stored metadata, regenerated, regenerated with salt
    2. "Salt" : any character string that will be appended to the item stream during the MD5 regeneration
    3. "Space with input listing" : space holding the list of items over which to run the service
    4. "Input listing name" : item name of list of items over which to run the service
    5. "Output space" : destination space of service outputs
    6. "Output listing name" : destination item name of MD5s listing
    7. "Output report name" : destination item name of fixity report
    8. "Store" : underlying storage provider over which service will run
  2. Mode 2
    1. "Get integrity information from..." : defines the source of the MD5: pre-stored metadata, regenerated, regenerated with salt
    2. "Salt" : any character string that will be appended to the item stream during the MD5 regeneration
    3. "Space with input listing" : space holding the list of items over which to run the service
    4. "Space containing content items" : source space of items over which to run the service
    5. "Input listing name" : item name of list of items over which to run the service
    6. "Output space" : destination space of service outputs
    7. "Output listing name" : destination item name of MD5s listing
    8. "Output report name" : destination item name of fixity report
    9. "Store" : underlying storage provider over which service will run
  3. Mode 3
    1. "Get integrity information from..." : defines the source of the MD5: pre-stored metadata, regenerated, regenerated with salt
    2. "Salt" : any character string that will be appended to the item stream during the MD5 regeneration
    3. "Space with input listing" : space holding the list of items over which to run the service
    4. "Input listing name" : item name of list of items over which to run the service
    5. "Output space" : destination space of service outputs
    6. "Output listing name" : destination item name of MD5s listing
    7. "Store" : underlying storage provider over which service will run
  4. Mode 4
    1. "Get integrity information from..." : defines the source of the MD5: pre-stored metadata, regenerated, regenerated with salt
    2. "Salt" : any character string that will be appended to the item stream during the MD5 regeneration
    3. "Space containing content items" : source space of items over which to run the service
    4. "Output space" : destination space of service outputs
    5. "Output listing name" : destination item name of MD5s listing
    6. "Store" : underlying storage provider over which service will run
  5. Mode 5
    1. "Space with input listing" : space holding first list of MD5s
    2. "Space with second input listing" : space holding second list of MD5s
    3. "Input listing name" : item name of first list of MD5s
    4. "Second input listing name" : item name of second list of MD5s
    5. "Output space" : destination space of service outputs
    6. "Output report name" : destination item name of fixity report
    7. "Store" : underlying storage provider over which service will run

Bulk Bit Integrity Checker

Description:

The Bulk Bit Integrity Checker provides a simple way to determine checksums (MD5s) for all content items in any particular space by leveraging an Amazon Hadoop cluster. This service is designed for large datasets (+10GB).

Configuration Options:

  1. Source Space: DuraCloud space where source files are stored
  2. Destination Space: DuraCloud space where report file will be placed
  3. Standard vs. Advanced Bulk configuration
    1. Standard allows the user to choose "Optimize for cost" or "Optimize for speed"
    2. Advanced allows the user to configure the number and type of servers to use
      1. Number of Server Instances: The number of servers to use to perform the MD5 generation task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Image Transformer

Description:

The Image Transformer provides a simple way to transform relatively small numbers of image files from one format to another.

Note that the ImageMagick service must be deployed prior to using the Image Transformer

Configuration Options:

  1. Source Space: DuraCloud space where source image files are stored
  2. Destination Space: DuraCloud space where transformed image files will be placed, along with a file which details the results of the conversion process
  3. Destination Format: The image format to which the source files will be transformed
  4. Destination Color Space: The colorspace of the transformed files, either "Source Image Color Space", meaning that the colorspace of the original image will be used, or sRGB, meaning that the colorspace will be transformed to sRGB.
  5. Source file name prefix: Only files beginning with the value provided here will be transformed. For example, if you enter ABC, only files whose names begin with the string ABC will be processed. This field is optional.
  6. Source file name suffix: Only files ending with the value provided here will be transformed. For example, you enter .jpg, only files whose names ends with the string .jpg will be processed. This field is optional.

Bulk Image Transformer

Description:

The Bulk Image Transformer provides a simple way to transform image files from one format to another in bulk. This service uses Amazon's Elastic Map Reduce capability to run the image transformation task within a Hadoop cluster.

Configuration Options:

  1. Source Space: DuraCloud space where source image files are stored
  2. Destination Space: DuraCloud space where transformed image files will be placed, along with a file which details the results of the transformation process
  3. Destination Format: The image format to which the source files will be transformed
  4. Destination Color Space: The colorspace of the transformed files, either "Source Image Color Space", meaning that the colorspace of the original image will be used, or sRGB, meaning that the colorspace will be transformed to sRGB.
  5. Source file name prefix: Only files beginning with the value provided here will be transformed. For example, if you enter ABC, only files whose names begin with the string ABC will be processed. This field is optional.
  6. Source file name suffix: Only files ending with the value provided here will be transformed. For example, you enter .jpg, only files whose names ends with the string .jpg will be processed. This field is optional.
  7. Standard vs. Advanced Bulk configuration
    1. Standard allows the user to choose "Optimize for cost" or "Optimize for speed"
    2. Advanced allows the user to configure the number and type of servers to use
      1. Number of Server Instances: The number of servers to use to perform the image transformation task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

(warning) Note that there have been issues discovered during testing of the Bulk Image Transformer. If you choose to run this service, it is recommended that the size of images being used be kept under 100MB. The likelihood of success appears to increase with server size, and number of servers being set to 3 or more is recommended. If you do run this service, please note the data set and configuration and make us aware of the outcome.

System Transformer Utility

Description:

The System Transformer Utility deploys the ImageMagick application on a DuraCloud service instance, which allows other services to take advantage of its features. The Image Transformer requires that this service be deployed in order to operate correctly.

Configuration Options:

None

System WebApp Utility

Description:

The System WebApp Utility coordinates the installation, de-installation, startup and shutdown of Apache Tomcat servers on a DuraCloud service instance. These Tomcat servers are created to allow other DuraCloud services to deploy web applications. The Image Server requires that this service be deployed in order to operate correctly.

Configuration Options:

None

  • No labels