Page History
In this section, we explain the storage layer: the database structure, maintenance, and the bistream store and configurations.
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
RDBMS / Database Structure
DSpace uses a relational database to store all information about the organization of content, metadata about the content, information about e-people and authorization, and the state of currently-running workflows. The DSpace system also uses the relational database in order to maintain indices that users can browse.
Warning | ||
---|---|---|
| ||
This Database schema is not fully up to date with DSpace 3.0. 2 additional table to store item level versioning information were added that are currently not represented in this diagram. |
Most of the functionality that DSpace uses can be offered by any standard SQL database that supports transactions. Presently, the browse indices use some features specific to PostgreSQL and Oracle, so some modification to the code would be needed before DSpace would function fully with an alternative database back-end.
...
The database schema used by DSpace is created by SQL statements stored in a directory specific to each supported RDBMS platform:
...
- PostgreSQL schemas are in _\[dspace-source\]/dspace/etc/postgres/_
Oracle schemas are in _\[dspace-source\]/dspace/etc/oracle/_Wiki Markup
The SQL (DDL) statements to create the tables for the current release, starting with an empty database, aer in _database_schema.sql_. The schema SQL file also creates the two required e-person groups (_Anonymous_ and _Administrator_) that are required for the system to function properly.
...
- function properly.
Also in _\[dspace-source\]/dspace/etc/\[database\]_ are various SQL files called _database_schema_1x_1y_. These contain the necessary SQL commands to update a live DSpace database from version 1._x_ to 1._y_. Note that this might not be the only part of an upgrade process: see Updating a DSpace Installation for details.
The DSpace database code uses an SQL function getnextid to assign primary keys to newly created rows. This SQL function must be safe to use if several JVMs are accessing the database at once; for example, the Web UI might be creating new rows in the database at the same time as the batch item importer. The PostgreSQL-specific implementation of the method uses SEQUENCES for each table in order to create new IDs. If an alternative database backend were to be used, the implementation of getnextid could be updated to operate with that specific DBMS.
The etc directory in the source distribution contains two further SQL files. clean-database.sql contains the SQL necessary to completely clean out the database, so use with caution! The Ant target clean_database can be used to execute this. update-sequences.sql contains SQL to reset the primary key generation sequences to appropriate values. You'd need to do this if, for example, you're restoring a backup database dump which creates rows with specific primary keys already defined. In such a case, the sequences would allocate primary keys that were already used.unmigrated-wiki-markup
Versions of the *_.sql{_}* files for Oracle are stored in _\[dspace-source\]/dspace/etc/oracle_. These need to be copied over their PostgreSQL counterparts in _\[dspace-source\]/dspace/etc_ prior to installation.
Maintenance and Backup
When using PostgreSQL, it's a good idea to perform regular 'vacuuming' of the database to optimize performance. This is performed by the vacuumdb command which can be executed via a 'cron' job, for example by putting this in the system crontab:
Code Block |
---|
# clean up the database nightly
40 2 * * * /usr/local/pgsql/bin/vacuumdb --analyze dspace > /dev/null 2>&1
|
The DSpace database can be backed up and restored using usual methods, for example with pg_dump and psql. However when restoring a database, you will need to perform these additional steps:
The fresh_install target loads up the initial contents of the Dublin Core type and bitstream format registries, as well as two entries in the epersongrouptable for the system anonymous and administrator groups. Before you restore a raw backup of your database you will need to remove these, since they will already exist in your backup, possibly having been modified. For example, use:
Code Block unmigrated-wiki-markupDELETE FROM dctyperegistry; DELETE FROM bitstreamformatregistry; DELETE FROM epersongroup;
After
restoring
a
backup,
you
will
need
to
reset
the
primary
key
generation
sequences
so
that
they
do
not
produce
already-used
primary
keys.
Do
this
by
executing
the
SQL
in
_\[dspace-source
\]/dspace/etc/update-sequences.sql
_,
for
example
with:
Code Block psql -U dspace -f [dspace-source]/dspace/etc/update-sequences.sql
Future updates of DSpace may involve minor changes to the database schema. Specific instructions on how to update the schema whilst keeping live data will be included. The current schema also contains a few currently unused database columns, to be used for extra functionality in future releases. These unused columns have been added in advance to minimize the effort required to upgrade.
Configuring the RDBMS Component
...
For example, a bitstream with the internal ID 12345678901234567890123456789012345678 is stored in the directory:
Code Block |
---|
(assetstore dir)/12/34/56/12345678901234567890123456789012345678
|
...
Bitstream stores in the file system on the server are configured like this:
Code Block |
---|
assetstore.dir = [dspace]/assetstore
|
(Remember that _\[dspace\]_ is a placeholder for the actual name of your DSpace install directory). Wiki Markup
The above example specifies a single asset store.
Code Block |
---|
assetstore.dir = [dspace]/assetstore_0
assetstore.dir.1 = /mnt/other_filesystem/assetstore_1
|
...
By default, newly created bitstreams are put in asset store 0 (i.e. the one specified by the assetstore.dir property.) This allows backwards compatibility with pre-DSpace 1.1 configurations. To change this, for example when asset store 0 is getting full, add a line to dspace.cfg like:
Code Block |
---|
assetstore.incoming = 1
|
Then restart DSpace (Tomcat). New bitstreams will be written to the asset store specified by assetstore.dir.1, which is /mnt/other_filesystem/assetstore_1 in the above example.
...
For example, let's say asset store number 1 will refer to SRB. The there will be a set of SRB account parameters like this:
Code Block |
---|
srb.host.1 = mysrbmcathost.myu.edu
srb.port.1 = 5544
srb.mcatzone.1 = mysrbzone
srb.mdasdomainname.1 = mysrbdomain
srb.defaultstorageresource.1 = mydefaultsrbresource
srb.username.1 = mysrbuser
srb.password.1 = mysrbpassword
srb.homedirectory.1 = /mysrbzone/home/mysrbuser.mysrbdomain
srb.parentdir.1 = mysrbdspaceassetstore
|
...