Description

There are two types of Storage which we track in Chronopolis. The first is the StorageRegion, which defines an area at a Chronopolis Node which can be used for Intake services. The second is a StagingStorage object which defines each usage of the StorageRegion by an Intake service.

Storage Regions

At the moment Storage Regions are primarily used for Intake Services which will specify which region a Bag resides in during a deposit. Then, depending on the StorageRegion used, different replication information is supplied when creating Replications which is useful when we are replicating off of multiple servers and data centers.

Currently there are two types of Storage Regions - BAG and TOKENS. This was done in order to allow the Intake Server to continue to write tokens out to a filesystem for replication.

Adding and Updating StorageRegions

~~example and documentation~~

Staging Storage

Staging Storage objects are primarily used by the Intake Services in order to determine where data lives and how much is used. In addition, upon completion of all Replications, an Intake Service must clean up after itself and deactivate the Staging Storage object in the Ingest Server. Further usage of Staging Storage objects can be found in the Bag documentation.


See Also:

API Documentation

Storage Region

A region where staged data may live

FieldTypeDescription
idLongThe id of a StorageRegion in the database
nodeStringThe namespace of the node which uses a StorageRegion
noteStringA note accompanying the StorageRegion to further identify it
capacityLongThe amount of capacity which an Intake Service is allotted for a StorageRegion
dataTypeDataTypeThe type of data to be stored on a StorageRegion
storageTypeStorageType

The type of storage of the StorageRegion

replicationConfigReplicationConfigThe configuration information used when creating Replications for a StorageRegion

Replication Config

FieldTypeDescription
regionLongThe StorageRegion associated with the ReplicationConfiguration
pathStringThe path to use when creating a Replication
serverStringThe fully qualified domain name of the server which clients will connect to
usernameStringThe user which a client should connect as; if left empty it will default to the clients node namespace

Staging Storage

Staging Information about where content is stored and fixity information for that data

FieldTypeDescription
sizeLongThe total size of the staged data
pathStringThe relative path to the staged data (can point to either a file or a directory)
regionLongID of the StorageRegion which the data is staged on
activeBoolean

Boolean flag showing if the data is still staged

totalFilesLong

The number of files staged

fixitiesSet<Fixity>Fixity values of the staged data

Fixity

Fixity Information for validating staged content

FieldTypeDescription
valueStringThe computed digest
algorithmStringThe name of the algorithm used
createdAtDateTime

The time of creation of the digest

DataType

An enumeration of possible Data which can be stored on a StorageRegion

  • BAG: The BagIt Bag
  • TOKEN: The ACE Token Store

StorageType

An enumeration of Storage Architectures which Chronopolis Supports

  • LOCAL: A local (ideally POSIX) filesystem

Ingest Usage

Storage Region

The StorageRegion is used in the Ingest Server mainly as tracking for how much space is currently used on a given filesystem. Note that it is only an approximation as other files may exist which Intake clients do not know/care about. In general an Intake client should not exceed the usage of the capacity defined in the StorageRegion, though that must be handled in their implementation.

StorageRegions also contain Replication Configuration information used when distributing a Bag to different Nodes in Chronopolis. At the moment this is limited to rsync, and the final result should be expected to look like:

username@server:path/bag.depositor/bag.name

Note that if a username is not supplied in the Configuration the namespace of the Node will be used in its place.

The Ingest Server also requires that a Token StorageRegion is defined which it can write to. This is because the Ingest Server will write TokenStores for all bags into said StorageRegion. This is included in the application.yml with

chron.stage.tokens.posix.{id,path}

Staging Storage

The StagingStorage entity is used in the Ingest Server to track when a Bag has been staged for distribution. It contains a flag which marks if it is active, which can be used to determine if the space it uses on a filesystem may be freed. The Fixity information alongside the StagingStorage is used in order to validate any Replication which occurs. At the moment, for a Bag the tagmanifest is used and for a Token the TokenStorage is used.

Intake Usage

Storage Region

During configuration of an Intake Service, a StorageRegion must be set in order for it to tell the Ingest Server which Region a Bag is staged in. This is set in the application.yml with

chron.stage.bags.posix.{id,path}

In the future more types of Regions may be supported, depending on the implementation needs.

  • No labels