Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added a heading for the cleanup command

In this section, we explain the storage layer: the database structure, maintenance, and the bistream store and configurations.

Table of Contents
minLevel2
outlinetrue
stylenone

RDBMS / Database Structure

...

When using PostgreSQL, it's a good idea to perform regular 'vacuuming' of the database to optimize performance. This is performed by the vacuumdb command which can be executed via a 'cron' job, for example by putting this in the system crontab:

Code Block

# clean up the database nightly
40 2 * * * /usr/local/pgsql/bin/vacuumdb --analyze dspace > /dev/null 2>&1

The DSpace database can be backed up and restored using usual methods, for example with pg_dump and psql. However when restoring a database, you will need to perform these additional steps:

  • The fresh_install target loads up the initial contents of the Dublin Core type and bitstream format registries, as well as two entries in the epersongrouptable for the system anonymous and administrator groups. Before you restore a raw backup of your database you will need to remove these, since they will already exist in your backup, possibly having been modified. For example, use:

    Code Block
    
    DELETE FROM dctyperegistry;
    DELETE FROM bitstreamformatregistry;
    DELETE FROM epersongroup;
    
  • After restoring a backup, you will need to reset the primary key generation sequences so that they do not produce already-used primary keys. Do this by executing the SQL in [dspace-source]/dspace/etc/update-sequences.sql, for example with:

    Code Block
    
    psql -U dspace -f  [dspace-source]/dspace/etc/update-sequences.sql
    

    Future updates of DSpace may involve minor changes to the database schema. Specific instructions on how to update the schema whilst keeping live data will be included. The current schema also contains a few currently unused database columns, to be used for extra functionality in future releases. These unused columns have been added in advance to minimize the effort required to upgrade.

Configuring the RDBMS Component

...

For example, a bitstream with the internal ID 12345678901234567890123456789012345678 is stored in the directory:

Code Block

(assetstore dir)/12/34/56/12345678901234567890123456789012345678

...

Similarly, when a bitstream is deleted for some reason, its deleted flag is set to true as part of the overall transaction, and the corresponding file in storage is not deleted.

Cleanup

The above techniques mean that the bitstream storage manager is transaction-safe. Over time, the bitstream database table and file store may contain a number of 'deleted' bitstreams. The cleanup method of BitstreamStorageManager goes through these deleted rows, and actually deletes them along with any corresponding files left in the storage. It only removes 'deleted' bitstreams that are more than one hour old, just in case cleanup is happening in the middle of a storage operation.

...

Bitstream stores in the file system on the server are configured like this:

Code Block

assetstore.dir =  [dspace]/assetstore

...

The above example specifies a single asset store.

Code Block

assetstore.dir =  [dspace]/assetstore_0
assetstore.dir.1 = /mnt/other_filesystem/assetstore_1

...

By default, newly created bitstreams are put in asset store 0 (i.e. the one specified by the assetstore.dir property.) This allows backwards compatibility with pre-DSpace 1.1 configurations. To change this, for example when asset store 0 is getting full, add a line to dspace.cfg like:

Code Block

assetstore.incoming = 1

Then restart DSpace (Tomcat). New bitstreams will be written to the asset store specified by assetstore.dir.1, which is /mnt/other_filesystem/assetstore_1 in the above example.

...

For example, let's say asset store number 1 will refer to SRB. The there will be a set of SRB account parameters like this:

Code Block

srb.host.1 = mysrbmcathost.myu.edu
srb.port.1 = 5544
srb.mcatzone.1 = mysrbzone
srb.mdasdomainname.1 = mysrbdomain
srb.defaultstorageresource.1 = mydefaultsrbresource
srb.username.1 = mysrbuser
srb.password.1 = mysrbpassword
srb.homedirectory.1 = /mysrbzone/home/mysrbuser.mysrbdomain
srb.parentdir.1 = mysrbdspaceassetstore

...