This documentation space is deprecated. Please make all updates to DuraCloud documentation on the live DuraCloud documentation space.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Introduction

The DuraCloud application provides a set of services which can be deployed and used for a variety of purposes, primarily to process the content which has been loaded into DuraCloud storage. The following list of services describes how each service is expected to be used and the options available for tailoring the service to your needs.

Note that the current way all DuraCloud services are configured, they will not auto-restart if they fail during processing. If you notice a failed job state, simply redeploy the service. Automatic service recovery is on the roadmap for DuraCloud development in the future and will be made available as soon as possible.

Duplicate on Upload

Description:

The Duplicate on Upload service provides a way to ensure that the content added to DuraCloud is stored with at least two storage providers. The Duplicate on Upload service performs on-ingest duplication of content. This means that once the Duplicate on Upload service is deployed, it watches for all content that is added to your DuraCloud account, determines if it should be copied to another DuraCloud store, and if so, performs the copy. All content that is copied will be placed in an identically named space in the secondary storage location.

Configuration Options:

  1. Watch this store for uploads: The primary storage location which DuraCloud will monitor for file additions. When files are added to this store, they will be copied to the secondary store.
  2. Copy to this store: The secondary store where content will be copied after it has been added to the primary store.

Duplicate on Demand

Description:

The Duplicate On Demand service provides a simple way to duplicate content from one space to another. This service is primarily focused on allowing the duplication of content from the primary storage provider to a secondary provider. To begin, a source space is chosen, along with a store and space to which content will be duplicated. The service then performs a copy of all content and metadata in the source space to the duplication space, creating the space if necessary. When the service has completed its work, a results file will be stored in the chosen space and a set of files (primarily logs) created as part of the process will be stored in the work space.

Configuration Options:

  1. Source Space: DuraCloud space where source files can be found
  2. Copy to this store: DuraCloud store to which content will be copied
  3. Copy to this space: DuraCloud space where content will be copied
  4. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Image Server

Description:

The Image Server provides a viewer for image files through use of the Djatoka image server. While this service is geared towards serving JPEG 2000 images, it supports multiple image file types by converting them to JPEG 2000 format on the fly.

Note that the current implementation of this service requires that spaces be set to OPEN in order to use the viewer to view image files.

Configuration Options:

None

Media Streamer

Description:

The Media Streamer provides streaming capabilities for video and audio files. The service takes advantage of Amazon Cloudfront streaming, so files to be streamed must be within a space on an Amazon provider. Also, all media to be streamed by this service needs to be within a single space.

Amazon Cloudfront streaming uses the Flash Media Server to host streaming files over RTMP. File formats supported include MP3, MP4 and FLV among others. For a full listing of supported file types see the Flash Media Server documentation.

Configuration Options:

  1. Source Media Space: The DuraCloud space where the source video and audio files to be streamed are stored. The Media Streamer service attempts to stream all files in this space.
  2. Viewer Space: A DuraCloud space where example viewer files will be stored. After the service has started, this space will include a playlist including all items in the source media space as well as example html and javascript files which can be used to display a viewer.

Output Files, the following files can be found in the configured Viewer Space once the Media Streamer is running:

  • player.swf - The flash-based video player JWPlayer
  • playlist.xml - A playlist, created by DuraCloud, which includes all of the items in your Source Media Space
  • playlistplayer.html - An HTML file, created by DuraCloud, which uses JWPlayer to display the items in the playlist
  • singleplayer.html - An HTML file, created by DuraCloud, which uses JWPlayer to display a single media file (typically, the first item in your Source Media Space)
  • stylish.swf - A supplementary flash file used to style the JWPlayer
  • swfobject.js - A javascript file (available from here) used to embed the JWPlayer on a web page
  • viewer.js - A javascript file, created by DuraCloud, used to simplify the loading of JWPlayer

All of the output files are intended as examples only. Their purpose is give developers a starting point for embedding video streamed by DuraCloud on their own web pages. Feel free to use, modify, ignore, or delete these files.

Bit Integrity Checker

Description:

The Bit Integrity Checker provides the ability to verify that the content held within DuraCloud has maintained its bit integrity. There are two modes of operation.
Modes:

  1. Verify integrity of a Space
  2. Verify integrity of an item list

When running in the Verify integrity of a Space mode, the checker performs the following steps

  • collect the content hash values for each item from the underlying storage provider
  • stream through each item recalculating their hashes
  • compare the two listings

When running in the Verify integrity of an item list mode, the checker performs the following steps

  • stream through each item in the provided listing, recalculating their hashes
  • compare the newly generated listing with the provided listing

Configuration Options:

  1. Stores: The underlying storage provider over which the service will run
  2. Space containing content items: The DuraCloud space in which the content items to be verified reside
  3. Verify integrity of an item list mode
    1. Input listing name: Name of the content item which contains the listing of items over which to run the service

Service Ouputs
All outputs of this service are placed in the system space, x-service-out.

  1. bitintegrity/fingerprints-gen-<spaceId>-<date>.csv
    • interim listing generated with hash values from underlying storage provider
  2. bitintegrity/fingerprints-<spaceId>-<date>.csv
    • interim listing with hashes recalculated from content streams
  3. bitintegrity/fixity-report-<spaceId>-<date>.csv
    • final report with status of integrity check

Bit Integrity Checker - Tools

Description:

The Bit Integrity Checker Tools provide additional bit integrity checking utilities which can be used to perform specific integrity checking tasks.

Modes:

  1. Generate integrity information for a Space
  2. Generate integrity information for an item list
  3. Compare two integrity reports

Configuration Options:

  1. Mode 1 - Generate integrity information for a Space
    1. Get integrity information from...
      1. The storage provider: Determine the file MD5 by asking the storage provider for its stored MD5 value
      2. The files themselves: Determine the file MD5 by retrieving them from the storage provider and computing the MD5
    2. Stores: The underlying storage provider in which the following space resides
      1. Space containing content items: The DuraCloud space in which the content items to be considered reside
  2. Mode 2 - Generate integrity information for an item list
    1. Get integrity information from...
      1. The storage provider: Determine the file MD5 by asking the storage provider for its stored MD5 value
      2. The files themselves: Determine the file MD5 by retrieving them from the storage provider and computing the MD5
      3. Input listing name: Name of the content item which contains the listing of items over which to run the service
    2. Stores: The underlying storage provider in which the following space resides
      1. Space with input listing: The DuraCloud space in which the input listing file resides
  3. Mode 3 - Compare two integrity reports
    1. Input listing name: Name of the first content item which contains a listing of items to be compared to the second listing
    2. Second input listing name: Name of the second content item which contains a listing of items to be compared to the first listing
    3. Stores: The underlying storage provider in which the following spaces reside
      1. Space with input listing: The DuraCloud space in which the first input listing file resides
      2. Space with second input listing: The DuraCloud space in which the second input listing file resides

Service Ouputs

  1. bitintegrity/fingerprints-<spaceId>-<date>.csv
    • listing of hashes when running in from space or from list modes
  2. bitintegrity/fixity-report-<listingId-0>vs<listingId-1>-<date>.csv
    • comparison report of two hash listings

Bit Integrity Checker - Bulk

Description:

The Bulk Bit Integrity Checker provides a simple way to determine checksums (MD5s) for all content items in any particular space by leveraging an Amazon Hadoop cluster. This service is designed for large datasets (+10GB).

Configuration Options:

  1. Space to verify: DuraCloud space where source files are stored
  2. Service Mode
    1. Verify integrity of a Space: Retrieves all items in a space, computes the checksum of each, and compares that value with the MD5 value available from the storage provider
    2. Verify integrity from an item list: Retrieves all items listed in the item list, computes the checksum of each, and compares that value with the MD5 value provided in the item list
      1. Space with input listing: The DuraCloud space in which the input listing file resides
      2. Input listing name: Name of the content item which contains the listing of items over which to run the service
  3. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Service Ouputs

  1. bitIntegrity-bulk/bitIntegrity-report-<date>.csv
    • final report with status of integrity check
  2. bitIntegrity-bulk/bitIntegrity-results.csv
    • interim listing with hashes recalculated from content streams

Image Transformer

Description:

The Image Transformer provides a simple way to transform relatively small numbers of image files from one format to another.

Note that the ImageMagick service must be deployed prior to using the Image Transformer

Configuration Options:

  1. Source Space: DuraCloud space where source image files are stored
  2. Destination Space: DuraCloud space where transformed image files will be placed, along with a file which details the results of the conversion process
  3. Destination Format: The image format to which the source files will be transformed
  4. Destination Color Space: The colorspace of the transformed files, either "Source Image Color Space", meaning that the colorspace of the original image will be used, or sRGB, meaning that the colorspace will be transformed to sRGB.
  5. Source file name prefix: Only files beginning with the value provided here will be transformed. For example, if you enter ABC, only files whose names begin with the string ABC will be processed. This field is optional.
  6. Source file name suffix: Only files ending with the value provided here will be transformed. For example, you enter .jpg, only files whose names ends with the string .jpg will be processed. This field is optional.

Image Transformer - Bulk

Description:

The Bulk Image Transformer provides a simple way to transform image files from one format to another in bulk. This service uses Amazon's Elastic Map Reduce capability to run the image transformation task within a Hadoop cluster.

Configuration Options:

  1. Source Space: DuraCloud space where source image files are stored
  2. Destination Space: DuraCloud space where transformed image files will be placed, along with a file which details the results of the transformation process
  3. Destination Format: The image format to which the source files will be transformed
  4. Destination Color Space: The colorspace of the transformed files, either "Source Image Color Space", meaning that the colorspace of the original image will be used, or sRGB, meaning that the colorspace will be transformed to sRGB.
  5. Source file name prefix: Only files beginning with the value provided here will be transformed. For example, if you enter ABC, only files whose names begin with the string ABC will be processed. This field is optional.
  6. Source file name suffix: Only files ending with the value provided here will be transformed. For example, you enter .jpg, only files whose names ends with the string .jpg will be processed. This field is optional.
  7. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

System Transformer Utility

Description:

The System Transformer Utility deploys the ImageMagick application on a DuraCloud service instance, which allows other services to take advantage of its features. The Image Transformer requires that this service be deployed in order to operate correctly.

Configuration Options:

None

System WebApp Utility

Description:

The System WebApp Utility coordinates the installation, de-installation, startup and shutdown of Apache Tomcat servers on a DuraCloud service instance. These Tomcat servers are created to allow other DuraCloud services to deploy web applications. The Image Server requires that this service be deployed in order to operate correctly.

Configuration Options:

None

  • No labels