This documentation space is deprecated. Please make all updates to DuraCloud documentation on the live DuraCloud documentation space.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 22 Next »

Introduction

The DuraCloud application provides a set of services which can be deployed and used for a variety of purposes, primarily to process the content which has been loaded into DuraCloud storage. The following list of services describes how each service is expected to be used and the options available for tailoring the service to your needs.

Note that the current way all DuraCloud services are configured, they will not auto-restart if they fail during processing. If you notice a failed job state, simply redeploy the service. Automatic service recovery is on the roadmap for DuraCloud development in the future and will be made available as soon as possible.

Duplicate on Change

Description:

The Duplicate on Change service provides a way to ensure that the content stored in DuraCloud is synchronized between two different storage providers. The Duplicate on Change service duplicates any changes made to spaces, content, or properties on the store being watched by the service. This means that once the Duplicate on Change service is deployed, it notices all content that is added, updated, or deleted on the watched DuraCloud provider and performs the same functions on a secondary provider. All content that is copied will be placed in an identically named space in the secondary storage location with the same property fields attached. The duplication provided by this service is one-way; changes made to the secondary provider will not be reflected in the primary.

Configuration Options:

  1. Watch this store: The primary storage location which DuraCloud will monitor for changes. When spaces, content, or properties are added, updated, or deleted in this store, the same actions will be taken in the secondary store.
  2. Apply to this store: The secondary store where changes from the primary store will be applied.

Duplicate on Demand

Description:

The Duplicate On Demand service provides a simple way to duplicate content from one space to another. This service is primarily focused on allowing the duplication of content from the primary storage provider to a secondary provider. To begin, a source space is chosen, along with a store and space to which content will be duplicated. The service then performs a copy of all content and properties in the source space to the duplication space, creating the space if necessary. When the service has completed its work, a results file will be stored in the chosen space and a set of files (primarily logs) created as part of the process will be stored in the work space.

Configuration Options:

  1. Source Space: DuraCloud space where source files can be found
  2. Copy to this store: DuraCloud store to which content will be copied
  3. Copy to this space: DuraCloud space where content will be copied
  4. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Service Output
All outputs of this service are placed in the system space, x-service-out.

  1. duplicate-on-demand/duplicate-results-<date>.csv
    • Final report indicating which files were duplicated, as well as any failures encountered

Image Server

Description:

The Image Server provides a viewer for image files through use of the Djatoka image server. While this service is geared towards serving JPEG 2000 images, it supports multiple image file types by converting them to JPEG 2000 format on the fly.

Note that the current implementation of this service requires that spaces be set to OPEN in order to use the viewer to view image files.

Configuration Options:

None

Media Streamer

Description:

The Media Streamer provides streaming capabilities for video and audio files. The service takes advantage of Amazon Cloudfront streaming, so files to be streamed must be within spaces on an Amazon provider.

Amazon Cloudfront streaming uses the Flash Media Server to host streaming files over RTMP. File formats supported include MP3, MP4 and FLV among others. For a full listing of supported file types see the Flash Media Server documentation.

Configuration Options:

  1. Source Media Space: The DuraCloud space(s) where the source video and audio files to be streamed are stored. The Media Streamer service attempts to stream all files in the selected space(s).

Integration Files, the following files are intended as a starting point for integrating streaming media into your own website:

  • player.swf - The flash-based video player JWPlayer
  • playlist.xml - An example playlist which would include a list of items in your Source Media Space
  • playlistplayer.html - An HTML file which uses JWPlayer to display the items in the playlist
    • The variable, $STREAM-HOST, should be substituted with the hostname of the streaming server displayed in the Media Server service properties (e.g. 's1f6w2c2bo9vb7.cloudfront.net')
  • singleplayer.html - An HTML file which uses JWPlayer to display a single media file
    • The variable, $STREAM-HOST, should be substituted with the hostname of the streaming server displayed in the Media Server service properties (e.g. 's1f6w2c2bo9vb7.cloudfront.net')
    • The variable, $CONTENT-ID, should be substituted with the content-id of a single item in a streamed space (e.g. 'starwars.mp4')
  • stylish.swf - A supplementary flash file used to style the JWPlayer
  • swfobject.js - A javascript file (available from here) used to embed the JWPlayer on a web page
  • viewer.js - A javascript file used to simplify the loading of JWPlayer

All of the above files are intended as examples only. Their purpose is give developers a starting point for embedding video streamed by DuraCloud on their own web pages.

If you add files when the Media Streamer service is already running, they too will automatically be available for streaming.

Bit Integrity Checker

Description:

The Bit Integrity Checker provides the ability to verify that the content held within DuraCloud has maintained its bit integrity. There are two modes of operation.
Modes:

  1. Verify integrity of a Space
  2. Verify integrity of an item list

When running in the Verify integrity of a Space mode, the checker performs the following steps

  • collect the content hash values for each item from the underlying storage provider
  • stream through each item recalculating their hashes
  • compare the two listings

When running in the Verify integrity of an item list mode, the checker performs the following steps

  • stream through each item in the provided listing, recalculating their hashes
  • compare the newly generated listing with the provided listing

Configuration Options:

  1. Stores: The underlying storage provider over which the service will run
  2. Space containing content items: The DuraCloud space in which the content items to be verified reside
  3. Verify integrity of an item list mode
    1. Input listing name: Name of the content item which contains the listing of items over which to run the service

Service Ouputs
All outputs of this service are placed in the system space, x-service-out.

  1. bitintegrity/fingerprints-gen-<spaceId>-<date>.csv
    • Interim listing generated with hash values from underlying storage provider
  2. bitintegrity/fingerprints-<spaceId>-<date>.csv
    • Interim listing with hashes recalculated from content streams
  3. bitintegrity/fixity-report-<spaceId>-<date>.csv
    • Final report with status of integrity check

Bit Integrity Checker - Tools

Description:

The Bit Integrity Checker Tools provide additional bit integrity checking utilities which can be used to perform specific integrity checking tasks.

Modes:

  1. Generate integrity information for a Space
  2. Generate integrity information for an item list
  3. Compare two integrity reports

Configuration Options:

  1. Mode 1 - Generate integrity information for a Space
    1. Get integrity information from...
      1. The storage provider: Determine the file MD5 by asking the storage provider for its stored MD5 value
      2. The files themselves: Determine the file MD5 by retrieving them from the storage provider and computing the MD5
    2. Stores: The underlying storage provider in which the following space resides
      1. Space containing content items: The DuraCloud space in which the content items to be considered reside
  2. Mode 2 - Generate integrity information for an item list
    1. Get integrity information from...
      1. The storage provider: Determine the file MD5 by asking the storage provider for its stored MD5 value
      2. The files themselves: Determine the file MD5 by retrieving them from the storage provider and computing the MD5
      3. Input listing name: Name of the content item which contains the listing of items over which to run the service
    2. Stores: The underlying storage provider in which the following space resides
      1. Space with input listing: The DuraCloud space in which the input listing file resides
  3. Mode 3 - Compare two integrity reports
    1. Input listing name: Name of the first content item which contains a listing of items to be compared to the second listing
    2. Second input listing name: Name of the second content item which contains a listing of items to be compared to the first listing
    3. Stores: The underlying storage provider in which the following spaces reside
      1. Space with input listing: The DuraCloud space in which the first input listing file resides
      2. Space with second input listing: The DuraCloud space in which the second input listing file resides

Service Ouputs
All outputs of this service are placed in the system space, x-service-out.

  1. bitintegrity/fingerprints-<spaceId>-<date>.csv
    • Listing of hashes when running in from space or from list modes
  2. bitintegrity/fixity-report-<listingId-0>-vs-<listingId-1>-<date>.csv
    • Comparison report of two hash listings

Bit Integrity Checker - Bulk

Description:

The Bulk Bit Integrity Checker provides a simple way to determine checksums (MD5s) for all content items in any particular space by leveraging an Amazon Hadoop cluster. This service is designed for large datasets (+10GB).

Configuration Options:

  1. Space to verify: DuraCloud space where source files are stored
  2. Service Mode
    1. Verify integrity of a Space: Retrieves all items in a space, computes the checksum of each, and compares that value with the MD5 value available from the storage provider
    2. Verify integrity from an item list: Retrieves all items listed in the item list, computes the checksum of each, and compares that value with the MD5 value provided in the item list
      1. Space with input listing: The DuraCloud space in which the input listing file resides
      2. Input listing name: Name of the content item which contains the listing of items over which to run the service
  3. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Service Ouputs
All outputs of this service are placed in the system space, x-service-out.

  1. bitIntegrity-bulk/bitIntegrity-report-<date>.csv
    • Final report with status of integrity check
  2. bitIntegrity-bulk/bitIntegrity-results.csv
    • Interim listing with hashes recalculated from content streams

Image Transformer

Description:

The Image Transformer provides a simple way to transform relatively small numbers of image files from one format to another.

Note that the ImageMagick service must be deployed prior to using the Image Transformer

Configuration Options:

  1. Source Space: DuraCloud space where source image files are stored
  2. Destination Space: DuraCloud space where transformed image files will be placed, along with a file which details the results of the conversion process
  3. Destination Format: The image format to which the source files will be transformed
  4. Destination Color Space: The colorspace of the transformed files, either "Source Image Color Space", meaning that the colorspace of the original image will be used, or sRGB, meaning that the colorspace will be transformed to sRGB.
  5. Source file name prefix: Only files beginning with the value provided here will be transformed. For example, if you enter ABC, only files whose names begin with the string ABC will be processed. This field is optional.
  6. Source file name suffix: Only files ending with the value provided here will be transformed. For example, you enter .jpg, only files whose names ends with the string .jpg will be processed. This field is optional.

Service Output
All outputs of this service are placed in the system space, x-service-out.

  1. image-transformer/image-transformer-results-<date>.csv
    • Final report indicating images converted and any errors encountered.

Image Transformer - Bulk

Description:

The Bulk Image Transformer provides a simple way to transform image files from one format to another in bulk. This service uses Amazon's Elastic Map Reduce capability to run the image transformation task within a Hadoop cluster.

Configuration Options:

  1. Source Space: DuraCloud space where source image files are stored
  2. Destination Space: DuraCloud space where transformed image files will be placed, along with a file which details the results of the transformation process
  3. Destination Format: The image format to which the source files will be transformed
  4. Destination Color Space: The colorspace of the transformed files, either "Source Image Color Space", meaning that the colorspace of the original image will be used, or sRGB, meaning that the colorspace will be transformed to sRGB.
  5. Source file name prefix: Only files beginning with the value provided here will be transformed. For example, if you enter ABC, only files whose names begin with the string ABC will be processed. This field is optional.
  6. Source file name suffix: Only files ending with the value provided here will be transformed. For example, you enter .jpg, only files whose names ends with the string .jpg will be processed. This field is optional.
  7. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Service Output
All outputs of this service are placed in the system space, x-service-out.

  1. image-transformer-bulk/image-transformer-results-<date>.csv
    • Final report indicating images converted and any errors encountered.

CloudSync

Description:

The CloudSync service starts and runs the CloudSync application, which provides capabilities to allow the backup and restore of content from a Fedora repository into DuraCloud. For more information about CloudSync, please refer to its the CloudSync documentation.

  • No labels