Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Introduction

The DuraCloud application provides a set of services which can be deployed and used for a variety of purposes, primarily to process the content which has been loaded into DuraCloud storage. The following list of services describes how each service is expected to be used and the options available for tailoring the service to your needs.

...

  1. Store to Watch: The primary storage location which DuraCloud will monitor for changes. When spaces, content, or properties are added, updated, or deleted in this store, the same actions will be taken for the configured spaces in a secondary store.
  2. Space to store selection: Each space in the watched provider will be duplicated in the storage provider(s) selected. For each space, 0, 1, or more providers may be selected.
  3. Default space to store selection: Any new space added will be configured based on this default setting. This works the same as the space to store setting above, but applies only to spaces added after the Duplicate on Change service is deployed.

Duplicate on Demand

Description:

The Duplicate On Demand service provides a simple way to duplicate content from one space to another. This service is primarily focused on allowing the duplication of content from the primary storage provider to a secondary provider. To begin, a source space is chosen, along with a store and space to which content will be duplicated. The service then performs a copy of all content and properties in the source space to the duplication space, creating the space if necessary. When the service has completed its work, a results file will be stored in the chosen space and a set of files (primarily logs) created as part of the process will be stored in the work space.

Configuration Options:

  1. Source Space: DuraCloud space where source files can be found
  2. Copy to this store: DuraCloud store to which content will be copied
  3. Copy to this space: DuraCloud space where content will be copied
  4. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Service Output
All outputs of this service are placed in the system space, x-service-out.

  1. duplicate-on-demand/duplicate-results-<date>.csv
    • Final report indicating which files were duplicated, as well as any failures encountered

Image Server

Description:

The Image Server provides a viewer for image files through use of the Djatoka image server. While this service is geared towards serving JPEG 2000 images, it supports multiple image file types by converting them to JPEG 2000 format on the fly.

...

  1. bitintegrity/fingerprints-<spaceId>-<date>.csv
    • Listing of hashes when running in from space or from list modes
  2. bitintegrity/fixity-report-<listingId-0>-vs-<listingId-1>-<date>.csv
    • Comparison report of two hash listings

Bit Integrity Checker - Bulk

Description:

The Bulk Bit Integrity Checker provides a simple way to determine checksums (MD5s) for all content items in any particular space by leveraging an Amazon Hadoop cluster. This service is designed for large datasets (+10GB).

Configuration Options:

  1. Space to verify: DuraCloud space where source files are stored
  2. Service Mode
    1. Verify integrity of a Space: Retrieves all items in a space, computes the checksum of each, and compares that value with the MD5 value available from the storage provider
    2. Verify integrity from an item list: Retrieves all items listed in the item list, computes the checksum of each, and compares that value with the MD5 value provided in the item list
      1. Space with input listing: The DuraCloud space in which the input listing file resides
      2. Input listing name: Name of the content item which contains the listing of items over which to run the service
  3. Standard vs. Advanced configuration
    1. Standard mode automatically sets up the service to be run
    2. Advanced mode allows the user to configure the number and type of servers that will be used to run the job
      1. Number of Server Instances: The number of servers to use to perform the duplication task.
      2. Type of Server: The type (size) of server used as perform the task. The larger the server, the faster the processing will occur. Larger servers also cost more than smaller servers to run. For more information, see the Amazon EC2 documentation.

Service Ouputs
All outputs of this service are placed in the system space, x-service-out.

  1. bitIntegrity-bulk/bitIntegrity-report-<date>.csv
    • Final report with status of integrity check
  2. bitIntegrity-bulk/bitIntegrity-results.csv
    • Interim listing with hashes recalculated from content streams

CloudSync

Description:

The CloudSync service starts and runs the CloudSync application, which provides capabilities to allow the backup and restore of content from a Fedora repository into DuraCloud. For more information about CloudSync, please refer to its the CloudSync documentation.