DSpace uses a relational database to store all information about the organization of content, metadata about the content, information about e-people and authorization, and the state of currently-running workflows. The DSpace system also uses the relational database in order to maintain indices that users can browse.
DSpace 6 database schema (PostgresPostgreSQL). Right-click the image and choose "Save as" to save in full resolution. Instructions on updating this schema diagram are in How to update database schema diagram.
Most of the functionality that DSpace uses can be offered by any standard SQL database that supports transactions. However at this time, DSpace APIS use some features specific to PostgreSQL and Oracle, so some modification to the code would be needed before DSpace would function fully with an alternative database back-end.
org.dspace.storage.rdbms package provides access to an SQL database in a somewhat simpler form than using JDBC directly. The primary class is
DatabaseManager, which executes SQL queries and returns
- DSpace uses Hibernate ORM as the object relational mapping layer between the DSpace database and the DSpace code.
- The main Hibernate configuration can be found at
- Hibernate initialization is triggered via Spring (beans) defined
[dspace]/config/spring/api/core-hibernate.xml. This Spring configuration pulls in some settings from DSpace Configuration, namely all Database (db.*) settings defined there.
- All DSpace Object Classes provide a DAO (Data Access Object) implementation class that extends a GenericDAO interface defined in
org.dspace.core.GenericDAOclass. The default (abstract) implementation is in
- The DSpace Context object (
org.dspace.core.Context) provides access to the configured
org.dspace.core.DBConnection(Database Connection), which is HibernateDBConnection by default. The
org.dspace.core.HibernateDBConnectionclass provides access the the Hibernate Session interface (org.hibernate.Session) and its Transactions.
- Each Hibernate Session opens a single database connection when it is created, and holds onto it until the Session is closed. A Session may consist of one or more Transactions. Sessions are NOT thread-safe (so individual objects cannot be shared between threads).
- Hibernate will intelligently cache objects in the current Hibernate Session (on object access), allowing for optimized performance.
- DSpace provides methods on the Context object to specifically remove (
Context.uncacheEntity()) or reload (
Context.reloadEntity()) objects within Hibernate's Session cache.
- DSpace also provides special Context object "modes" to optimize Hibernate performance for read-only access (
Mode.READ_ONLY) or batch processing (
Mode.BATCH_EDIT). These modes can be specified when constructing a new Context object.
- The main Hibernate configuration can be found at
- DSpace uses FlywayDB to perform automated database initialization and upgrades
org.dspace.storage.rdbms.DatabaseUtilsclass manages all Flyway API calls, and executes the SQL migrations under the
org.dspace.storage.rdbms.sqlmigrationpackage and the Java migrations under the
- Once all database migrations have run, a series of Flyway Callbacks are triggered to initialize the (empty) database with required default content. For example, callbacks exist for adding default DSpace Groups (
GroupServiceInitializer), default Metadata & Format Registries (
DatabaseRegistryUpdater), and the default Site object (
SiteServiceInitializer). All Callbacks are under the
- While Flyway is automatically initialized and executed during
- startup, various Database Utilities are also available on the command line. These utilities allow you to manually trigger database upgrades or check the status of your database.
Most of the functionality that DSpace uses can be offered by any standard SQL database that supports transactions. However, at this time, DSpace only provides Flyway migration scripts for PostgreSQL and Oracle (and has only been tested with those database backends). Additional database backends should be possible, but would minimally require creating custom Flyway migration scripts for that database backend.
Maintenance and Backup
When using PostgreSQL, it's a good idea to perform regular 'vacuuming' of the database to optimize performance. By default, PostgreSQL performs automatic vacuuming on your behalf. However, if you have this feature disabled, then we recommend scheduling the
vacuumdb command to run on a regular basis.
After restoring a backup, you will need to reset the primary key generation sequences so that they do not produce already-used primary keys. Do this by executing the SQL in
[dspace]/etc/postgres/update-sequences.sql, for example with:
psql -U dspace -f [dspace]/etc/update-sequences.sql
The database manager is configured with the following properties in
The JDBC URL to use for accessing the database. This should not point to a connection pool, since DSpace already implements a connection pool.
JDBC driver class name. Since presently, DSpace uses PostgreSQL-specific features, this should be org.postgresql.Driver.
Username to use when accessing the database.
Corresponding password ot use when accessing the database.
Custom RDBMS tables,
columns or views
When at all possible, we recommend creating custom database tables or views within a separate schema from the DSpace database tables. Since the DSpace database is initialized and upgraded automatically using Flyway DB, the upgrade process may stumble or throw errors if you've directly modified the DSpace database schema, views or tables. Flyway itself assumes it has full control over the DSpace database schema, and it is not "smart" enough to know what to do when it encounters a locally customized database.