VIVO Documentation
Old Release
This documentation relates to an old version of VIVO, version 1.9.x. Looking for another version? See all documentation.
Essentially all of the data that you store in VIVO will be given to MySQL for storage. The actual location of this data depends on what system you have, and on how you install MySQL but you won’t need to know the location. You will access the data through VIVO, or occasionally through the MySQL client application.
MySQL 5.1 or higher, http://www.mysql.com
Create an empty MySQL database
Decide on a database name, username, and password. Log into your MySQL server and create a new database in MySQL that uses UTF-8 encoding . You will need these values for Step IV when you configure the deployment properties. At the MySQL command line you can create the database and user with these commands substituting your values for dbname , username , and password . Most of the time, the hostname will equal localhost .
CREATE DATABASE dbname CHARACTER SET utf8;
Grant access to a database user. For example:
GRANT ALL ON dbname.* TO 'username'@'hostname' IDENTIFIED BY 'password';
Keep track of the database name, username, and password.
Specify the JDBC URL of your database. Change the end of the URL to reflect your database name (if it is not "vivo").
Specify deployment properties
Instruction | Property | Value |
---|---|---|
Specify the JDBC URL of your database. Change the end of the URL to reflect your database name (if it is not "vivo"). | VitroConnection.DataSource.url | |
Change the username to match the authorized user you created in MySQL. | VitroConnection.DataSource.username | username set above |
Change the password to match the password you created in MySQL. | VitroConnection.DataSource.password | password set above |
Specify the maximum number of active connections in the database connection pool to support the anticipated number of concurrent page requests. | VitroConnection.DataSource.pool.maxActive | 40 |
Specify the maximum number of database connections that will be allowed to remain idle in the connection pool. Default is 25% of the maximum number of active connections. | VitroConnection.DataSource.pool.maxIdle | 10 |
Change the dbtype setting to use a database other than MySQL. Otherwise, leave this value unchanged. Possible values are DB2, derby, HSQLDB, H2, MySQL, Oracle, PostgreSQL, and SQLServer. Refer to http://openjena.org/wiki/SDB/Databases_Supported for additional information. | VitroConnection.DataSource.dbtype | MySQL |
Issues to be aware of
As indicated but not directly commented on above, a new installation of VIVO initially expects to be connected to an empty VIVO database that has been created with the character set UTF-8 so that the VIVO can read, store, and display characters encoded as Unicode
see alsoTomcat server.xml configuration to support non-Roman characters (UTF-8)
Tuning MySQL
From Stony Brook –
- this resulted in about a 3x speedup (especially for big ingests)
tmp_table_size
max_heap_table_size
key_buffer_size (needed because many of our queries include a group or sort)
Writing the MySQL transaction log
MySQL allows you to control its logging behavior, using the the innodb_flush_log_at_trx_commit
parameter. On some systems, changing the value of this parameter can dramatically improve performance.
Using the default setting, the log is written to the file buffer and the buffer is flushed to disk at the end of each transaction. This is necessary to insure full ACID compliance, but the overhead is substantial. Most of VIVO is not transaction-oriented: each statement is added or deleted in its own transaction. So the default setting means that a physical write to disk is required for each new RDF statement.
Setting innodb_flush_log_at_trx_commit
to 0 or 2 will greatly improve throughput, while adding a minimal level of risk to the data. Under some circumstances, with some settings, up to one second of transactions can be lost. Most VIVO installations will find this to be an acceptable level of risk.
setting | meaning | worst case risk |
---|---|---|
1 (default) | Write the log after each transaction. Flush to disk after each transaction. | If MySQL crashes, lose transactions in progress. On power failure or system crash, lose transactions in progress. |
2 | Write the log after each transaction. Flush to disk once per second. | If MySQL crashes, lose transactions in progress. On power failure or system crash, lose one second of transactions. |
0 | Write the log once per second. Flush to disk once per second. | If MySQL crashes, lose one second of transactions. On power failure or system crash, lose one second of transactions. |
This page provides full details regarding innodb_flush_log_at_trx_commit
: http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
Setting the MySQL query cache size
Increasing the MySQL query cache size will likely translate into improved VIVO performance in that once large pages have been fetched once, they're typically quite a bit faster to load on later fetches.