Page History
...
Bitstreams also have an 38-digit internal ID, different from the primary key ID of the bitstream table row. This is not visible or used outside of the bitstream storage manager. It is used to determine the exact location (relative to the relevant store directory) that the bitstream is stored in traditional or SRB storage. The first three pairs of digits are the directory path that the bitstream is stored under. The bitstream is stored in a file with the internal ID as the filename.
...
- Using a randomly-generated 38-digit number means that the 'number space' is less cluttered than simply using the primary keys, which are allocated sequentially and are thus close together. This means that the bitstreams in the store are distributed around the directory structure, improving access efficiency.
- The internal ID is used as the filename partly to avoid requiring an extra lookup of the filename of the bitstream, and partly because bitstreams may be received from a variety of operating systems. The original name of a bitstream may be an illegal UNIX filename.
- When storing a bitstream, the BitstreamStorageManager DOES BitstreamStorageService DOES set the following fields in the corresponding database table row:
- bitstream_id
- size
- checksum
- checksum_algorithm
- internal_id
- deleted
- store_number
...
- A database connection is created, separately from the currently active connection in the current DSpace context.
- An unique internal identifier (separate from the database primary key) is generated.
- The bitstream DB table row is created using this new connection, with the deleted column set to true.
- The new connection is _commit_ted, so the 'deleted' bitstream row is written to the database
- The bitstream itself is stored in a file in the configured 'asset store directory', with a directory path and filename derived from the internal ID
- The deleted flag in the bitstream row is set to false. This will occur (or not) as part of the current DSpace Context.
This means that should anything go wrong before, during or after the bitstream storage, only one of the following can be true:
- No bitstream table row was created, and no file was stored
- A bitstream table row with deleted=true was created, no file was stored
- A bitstream table row with deleted=true was created, and a file was stored
None of these affect the integrity of the data in the database or bitstream store.
...
The above techniques mean that the bitstream storage manager is transaction-safe. Over time, the bitstream database table and file store may contain a number of 'deleted' bitstreams. The cleanup method of BitstreamStorageManager goes BitstreamStorageService goes through these deleted rows, and actually deletes them along with any corresponding files left in the storage. It only removes 'deleted' bitstreams that are more than one hour old, just in case cleanup is happening in the middle of a storage operation.
...
Configuring the Bitstream Store
BitStores (aka assetstores) are configured with [dspace]/config/spring/api/bitstore.xml
Configuring Traditional Storage
By default, DSpace uses a traditional filesystem bitstore of [dspace]/assetstore/
To configure normal traditional filesystem bitstore, as a specific directory, configure the bitstore like this:
...
This would configure store number 0 named localStore, which is a DSBitStore
(filesystem), at the filesystem path of ${dspace.dir}/assetstore
(i.e. [dspace]/assetstore/
)
To It is also possible to use multiple local filesystems. Key0 In the below example, key #0 is localStore at ${dspace.dir}/assetstore
, and key1 key #1 is localStore2 at /data/assetstore2
. Note that incoming is set to store "1", which in this case refers to localStore2. That means that any new files (bitstreams) uploaded to DSpace will be stored in localStore2, but some existing bitstreams may still exist in localStore.
Code Blockcode |
---|
<bean name="org.dspace.storage.bitstore.BitstreamStorageService" class="org.dspace.storage.bitstore.BitstreamStorageServiceImpl"> <property name="incoming" value="1"/> <property name="stores"> <map> <entry key="0" value-ref="localStore"/> <entry key="1" value-ref="localStore2"/> </map> </property> </bean> <bean name="localStore" class="org.dspace.storage.bitstore.DSBitStoreService" scope="singleton"> <property name="baseDir" value="${dspace.dir}/assetstore"/> </bean> <bean name="localStore2" class="org.dspace.storage.bitstore.DSBitStoreService" scope="singleton"> <property name="baseDir" value="/data/assetstore2"/> </bean> |
...
Configuring Amazon S3 Storage
To use AWS Amazon S3 as a bitstore, add a bitstore entry s3Store
, using S3BitStoreService
, and configure it with awsAccessKey
, awsSecretKey
, and bucketName
. You NOTE: Before you can specify these settings, you obviously will have to create an account on in the Amazon AWS console, and create an IAM user with credentials , and privilege privileges to a an existing S3 bucket.
Code Block |
---|
<bean name="org.dspace.storage.bitstore.BitstreamStorageService" class="org.dspace.storage.bitstore.BitstreamStorageServiceImpl"> <property name="incoming" value="1"/> <property name="stores"> <map> <entry key="0" value-ref="localStore"/> <entry key="1" value-ref="s3Store"/> </map> </property> </bean> <bean name="localStore" class="org.dspace.storage.bitstore.DSBitStoreService" scope="singleton"> <property name="baseDir" value="${dspace.dir}/assetstore"/> </bean> <bean name="s3Store" class="org.dspace.storage.bitstore.S3BitStoreService" scope="singleton"> <!-- AWS Security credentials, with policies for specified bucket --> <property name="awsAccessKey" value=""/> <property name="awsSecretKey" value=""/> <!-- S3 bucket name to store assets in. example: longsight-dspace-auk --> <property name="bucketName" value=""/> <!-- AWS S3 Region to use: {us-east-1, us-west-1, eu-west-1, eu-central-1, ap-southeast-1, ... } --> <!-- Optional, sdk default is us-east-1 --> <property name="awsRegionName" value=""/> <!-- Subfolder to organize assets within the bucket, in case this bucket is shared --> <!-- Optional, default is root level of bucket --> <property name="subfolder" value=""/> </bean> |
...
The incoming property specifies which assetstore receives incoming assets (i.e. when new files are uploaded, they will be stored in the "incoming" assetstore). This defaults to store 0.
Note: SRB Storage Resource Broker bitstore support was removed with DSpace 6.
Configuring S3 Storage
S3BitStore has parameters for awsAccessKey, awsSecretKey, bucketName, awsRegionName (optional), and subfolder (optional).AccessKey
awsAccessKey
and
...
-
awsSecretKey
are created from
...
- the Amazon AWS
...
- console. You'll want to create
...
- an IAM user, and generate a Security Credential, which provides you the accessKey and secret. Since you need permission to use S3, you could give this IAM user a quick & dirty policy of AmazonS3FullAccess (for all S3 buckets that you own), or for finer grain controls, you can assign an IAM user to have certain permissions to certain resources, such as read/write to a specific subfolder within a specific s3 bucket.
bucketName
is a globally unique name that distinguishes your S3 bucket. It has to be unique among all other S3 users in the world.awsRegionName
is a region in AWS where S3 will be stored. Default is US Eastern. Consider distance to primary users, and pricing when choosing the region.subfolder
is a folder within the S3 bucket, where you could organize the assets to be in. If you wanted to re-use a bucket for multiple purposes (bucketname/assets vs bucketname/backups) or DSpace instances (bucketname/XYZDSpace or bucketname/ABCDSpace or bucketname/ABCDSpaceProduction).
Migrate BitStores
There is a command line migration tool to move all the assets within a bitstore, to another bitstore. bin/dspace bitstore-migrate
Code Block |
---|
/[dspace]/bin/dspace bitstore-migrate usage: BitstoreMigrate -a,--source <arg> Source assetstore store_number (to lose content). This is a number such as 0 or 1 -b,--destination <arg> Destination assetstore store_number (to gain content). This is a number such as 0 or 1. -d,--delete Delete file from losing assetstore. (Default: Keep bitstream in old assetstore) -h,--help Help -p,--print Print out current assetstore information -s,--size <arg> Batch commit size. (Default: 1, commit after each file transfer) /[dspace]/bin/dspace bitstore-migrate -p store[0] == DSBitStore, which has 2 bitstreams. store[1] == S3BitStore, which has 2 bitstreams. Incoming assetstore is store[1] /[dspace]/bin/dspace bitstore-migrate -a 0 -b 1 /[dspace]/bin/dspace bitstore-migrate -p store[0] == DSBitStore, which has 0 bitstreams. store[1] == S3BitStore, which has 4 bitstreams. Incoming assetstore is store[1] |