Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Introduction

The DuraCloud application provides a set of services which can be deployed and used for a variety of purposes, primarily to process the content which has been loaded into DuraCloud storage. The following list of services describes how each service is expected to be used and the options available for tailoring the service to your needs.

If you start a service, you will receive an email when it completes processing. For each of the services below, if they are run independently, they will not auto-restart if they fail during processing. If you are made aware of a failed job state, you have the option to redeploy the service. Some services are run automatically by the DuraCloud Executor, and as such their state is managed by the Executor. In particular, these are the Media Streamer and the Bit Integrity Tools services.

Info

Not all services are available in all service plans.

Duplicate on Change

Description:

The Duplicate on Change service provides a way to ensure that the content stored in DuraCloud is synchronized between different storage providers. The Duplicate on Change service duplicates any changes made to spaces, content, or properties for the spaces it is configured to watch. This means that once the Duplicate on Change service is deployed, it notices all content that is added, updated, or deleted for each configured space in the watched DuraCloud provider and performs the same functions on the selected secondary provider. All content that is copied will be placed in an identically named space in the secondary storage location with the same property fields attached. The duplication provided by this service is one-way; only the provider that is selected to be watched is monitored for changes.

Note that this service only performs duplication of content after it has been deployed. Content that exists in the provider prior to this service being deployed will not be duplicated. To duplicate existing content, see Duplication on Demand.

Configuration Options:

  1. Store to Watch: The primary storage location which DuraCloud will monitor for changes. When spaces, content, or properties are added, updated, or deleted in this store, the same actions will be taken for the configured spaces in a secondary store.
  2. Space to store selection: Each space in the watched provider will be duplicated in the storage provider(s) selected. For each space, 0, 1, or more providers may be selected.

Duplicate on Demand

Description:

The Duplicate On Demand service provides a simple way to duplicate content from one space to another. This service is primarily focused on allowing the duplication of content from the primary storage provider to a secondary provider. To begin, a source space is chosen, along with a store and space to which content will be duplicated. The service then performs a copy of all content and properties in the source space to the duplication space, creating the space if necessary. When the service has completed its work, a results file will be stored in the chosen space and a set of files (primarily logs) created as part of the process will be stored in the work space.

Configuration Options:

  1. Source Space: DuraCloud space where source files can be found
  2. Copy to this store: DuraCloud store to which content will be copied
  3. Copy to this space: DuraCloud space where content will be copied
  4. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Service Output
All outputs of this service are placed in the system space, x-service-out.

  1. duplicate-on-demand/duplicate-results-<date>.csv
    • Final report indicating which files were duplicated, as well as any failures encountered

Image Server

Description:

The Image Server provides a viewer for image files through use of the Djatoka image server. While this service is geared towards serving JPEG 2000 images, it supports multiple image file types by converting them to JPEG 2000 format on the fly.

Note that the current implementation of this service requires that spaces be set to OPEN in order to use the viewer to view image files.

Configuration Options:

None

Media Streamer

Description:

The Media Streamer provides streaming capabilities for video and audio files. The service takes advantage of Amazon Cloudfront streaming, so files to be streamed must be within spaces on an Amazon provider.

Amazon Cloudfront streaming uses the Flash Media Server to host streaming files over RTMP. File formats supported include MP3, MP4 and FLV among others. For a full listing of supported file types see the Flash Media Server documentation.

Configuration Options:

...

Media Streaming capabilities provided by DuraCloud allow video and audio files to be streamed over RTMP. This feature in DuraCloud takes advantage of Amazon Cloudfront streaming, so files to be streamed must be within spaces on an Amazon provider. Amazon Cloudfront streaming uses the Flash Media Server to host streaming files over RTMP. File formats supported include MP3, MP4 and FLV among others. For a full listing of supported file types see the Flash Media Server documentation.

Open and Secure Streaming

DuraCloud supports two different types of streaming, open and secure. Open streaming allows anyone with access to the URL for a streamed content item to stream the content. This works well for open access content which is intended to be shared and accessed widely. Secure streaming requires that a request be made to DuraCloud to retrieve a signed URL. That signed URL can then be used to stream the file. The request to retrieve a signed URL can specify how long the content will be available to stream (the default is 8 hours) as well as the IP address or IP address range where the streaming is allowed to take place. The purpose of secure streaming is to restrict the use of the stream. This is ideal for scenarios where streamed content is not free to use or must only be provided to a limited audience. Note that both types of streaming, open and secure, use the RTMP protocol, which protects the source file from being downloaded. The RTMP protocol requires that a flash-based streaming media player be used to play the streamed content.

Using Media Streaming

Follow these steps to stream media files with DuraCloud

  1. Create a space in DuraCloud which you will use to host streamed files
  2. Transfer media files to the space. Be sure that the file are using supported formats (see the link above).
  3. Enable streaming
    1. To enable open streaming: Select the space in the DuraCloud interface and click the "ON" button next to "Streaming:" in the top row of buttons.
    2. To enable secure streaming: Perform a POST HTTP call to the URL https://{institution}.duracloud.org/durastore/task/enable-streaming. The body of the POST request should include this JSON document: {"spaceId" : "","secure" : "true"}. Fill in the ID of the space to stream. Note that this call is using the DuraCloud REST API.
  4. Wait up to 15 minutes. If this is the first time the space has been streamed, it can take up to 15 minutes for the files to be available on the Amazon edge servers.
  5. Stream a file
    1. When using open streaming: 
      1. Select a media file in the space. A video player will appear in the Content Detail pane. Verify that you are able to play the streamed file.
      2. Look in the space properties for the RTMP streaming address. This is the path you will use for streaming files. Alternatively, you can perform a get-url task call through the DuraCloud REST API to retrieve the streaming URL for each content item to be streamed. These URLs are predictable and do not expire.
    2. When using secure streaming:
      1. Spaces using secure streaming do not provide playback via the DuraCloud UI. You will need to perform a "get-signed-url" call to retrieve a signed URL for each content item to be streamed, and stream the file through an RTMP compatible player. More details about the get-signed-url call can be found in the Amazon S3 Storage Provider tasks section of the DuraCloud REST API.
  6. Set up your website or application to provide access to the streamed files. Some example files to get you started are listed below.

Warning

The Flash Media Server used by Amazon Cloudfront and media players like JWPlayer and Flowplayer require certain specific conventions for requesting streamed files. There are two primary variables, one being a prefix which may need to preceed the file name (example prefix values are "mp3:" and "mp4:"). The other variable is whether a file extension is allowed on the file name. Getting these combinations right is particularly important when using secure streaming, as the player cannot request the file with alternative file names to match its preferences. Not all file types use the same combination of prefix and file extension settings. For example, it is common for MP4 files to require a prefix and extension (example file name: "mp4:videofile.mp4") while MP3 files require a prefix but no extension (example file name: "mp3:audiofile").

The prefix value, when needed, should be added to the stream path by using the "resourcePrefix" parameter on the get-url or get-signed-url call made through the DuraCloud REST API.

In most cases, the file extension will need to be part of the stored file name. Even if files are named with a file extension (which is typically the case), calls to retrieve a streaming URL can specify the file name with no extension

...

.

Integration Files

The following files are available as a bundle on the downloads pagehere.
They are intended as a starting point for integrating streaming media into your own website.

...

All of the above files are intended as examples only. Their purpose is give developers a starting point for embedding video streamed by DuraCloud on their own web pages.

Info

If you add files when the Media Streamer service is already running, they too will automatically be available for streaming.

Bit Integrity Checker

Description:

...

to

...

a

...

When running in the Verify integrity of a Space mode, the checker performs the following steps

  • collect the content hash values for each item from the underlying storage provider
  • stream through each item recalculating their hashes
  • compare the two listings

When running in the Verify integrity of an item list mode, the checker performs the following steps

  • stream through each item in the provided listing, recalculating their hashes
  • compare the newly generated listing with the provided listing

Configuration Options:

...

space

...

  1. Input listing name: Name of the content item which contains the listing of items over which to run the service

Service Ouputs
All outputs of this service are placed in the system space, x-service-out.

  1. bitintegrity/fingerprints-gen-<spaceId>-<date>.csv
    • Interim listing generated with hash values from underlying storage provider
  2. bitintegrity/fingerprints-<spaceId>-<date>.csv
    • Interim listing with hashes recalculated from content streams
  3. bitintegrity/fixity-report-<spaceId>-<date>.csv
    • Final report with status of integrity check

Bit Integrity Checker - Tools

Description:

The Bit Integrity Checker Tools provide additional bit integrity checking utilities which can be used to perform specific integrity checking tasks.

Modes:

  1. Generate integrity information for a Space
  2. Generate integrity information for an item list
  3. Compare two integrity reports

Configuration Options:

  1. Mode 1 - Generate integrity information for a Space
    1. Get integrity information from...
      1. The storage provider: Determine the file MD5 by asking the storage provider for its stored MD5 value
      2. The files themselves: Determine the file MD5 by retrieving them from the storage provider and computing the MD5
    2. Stores: The underlying storage provider in which the following space resides
      1. Space containing content items: The DuraCloud space in which the content items to be considered reside
  2. Mode 2 - Generate integrity information for an item list
    1. Get integrity information from...
      1. The storage provider: Determine the file MD5 by asking the storage provider for its stored MD5 value
      2. The files themselves: Determine the file MD5 by retrieving them from the storage provider and computing the MD5
      3. Input listing name: Name of the content item which contains the listing of items over which to run the service
    2. Stores: The underlying storage provider in which the following space resides
      1. Space with input listing: The DuraCloud space in which the input listing file resides
  3. Mode 3 - Compare two integrity reports
    1. Input listing name: Name of the first content item which contains a listing of items to be compared to the second listing
    2. Second input listing name: Name of the second content item which contains a listing of items to be compared to the first listing
    3. Stores: The underlying storage provider in which the following spaces reside
      1. Space with input listing: The DuraCloud space in which the first input listing file resides
      2. Space with second input listing: The DuraCloud space in which the second input listing file resides

Service Ouputs
All outputs of this service are placed in the system space, x-service-out.

  1. bitintegrity/fingerprints-<spaceId>-<date>.csv
    • Listing of hashes when running in from space or from list modes
  2. bitintegrity/fixity-report-<listingId-0>-vs-<listingId-1>-<date>.csv
    • Comparison report of two hash listings

Bit Integrity Checker - Bulk

Description:

The Bulk Bit Integrity Checker provides a simple way to determine checksums (MD5s) for all content items in any particular space by leveraging an Amazon Hadoop cluster. This service is designed for large datasets (+10GB).

Configuration Options:

  1. Space to verify: DuraCloud space where source files are stored
  2. Service Mode
    1. Verify integrity of a Space: Retrieves all items in a space, computes the checksum of each, and compares that value with the MD5 value available from the storage provider
    2. Verify integrity from an item list: Retrieves all items listed in the item list, computes the checksum of each, and compares that value with the MD5 value provided in the item list
      1. Space with input listing: The DuraCloud space in which the input listing file resides
      2. Input listing name: Name of the content item which contains the listing of items over which to run the service
  3. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Service Ouputs
All outputs of this service are placed in the system space, x-service-out.

  1. bitIntegrity-bulk/bitIntegrity-report-<date>.csv
    • Final report with status of integrity check
  2. bitIntegrity-bulk/bitIntegrity-results.csv
    • Interim listing with hashes recalculated from content streams

CloudSync

Description:

...

with streaming turned on, those files will automatically be made available for streaming as well.