Old Release
This documentation relates to an old version of DSpace, version 6.x. Looking for another version? See all documentation.
Support for DSpace 6 ended on July 1, 2023. See Support for DSpace 5 and 6 is ending in 2023
The software DSpace relies on does not come out of the box optimized for large repositories. Here are some tips to make it all run faster.
Review the number of DSpace webapps you have installed in Tomcat
By default, DSpace includes a number of web applications which all interact with the underlying DSpace data model. The DSpace web applications include: XMLUI, JSPUI, OAI, RDF, REST, SOLR, SWORD, and SWORDv2. The only required web application is SOLR as it is utilized by several of the other web applications (XMLUI, JSPUI and OAI). See the Installing DSpace documentation for more information about each of these web applications.
Any of the other web applications can be removed from your Tomcat, if you have no plans to utilize that functionality. The fewer web applications you are running, the less memory you will require, as each of these applications will be allocated memory when started up by Tomcat.
Give Tomcat (DSpace UIs) More Memory
Give Tomcat More Java Heap Memory
Java Heap Memory Recommendations
At the time of writing, DSpace recommends you should give Tomcat >= 512MB of Java Heap Memory to ensure optimal DSpace operation. Most larger sized or highly active DSpace installations however tend to allocate more like 1024MB to 2048MB of Java Heap Memory.
Performance tuning in Java basically boils down to memory. If you are seeing "java.lang.OutOfMemoryError: Java heap space
" errors, this is a sure sign that Tomcat isn't being provided with enough Heap Memory.
Tomcat is especially memory hungry, and will benefit from being given lots of RAM. To set the amount of memory available to Tomcat, use either the JAVA_OPTS
or CATALINA_OPTS
environment variable, e.g:
CATALINA_OPTS=-Xmx512m -Xms512m
OR
JAVA_OPTS=-Xmx512m -Xms512m
The above example sets the maximum Java Heap memory to 512MB.
Difference between JAVA_OPTS and CATALINA_OPTS
You can use either environment variable. JAVA_OPTS
is also used by other Java programs (besides just Tomcat). CATALINA_OPTS
is only used by Tomcat. So, if you only want to tweak the memory available to Tomcat, it is recommended that you use CATALINA_OPTS
. If you set both CATALINA_OPTS
and JAVA_OPTS
, Tomcat will default to using the settings in CATALINA_OPTS
.
If the machine is dedicated to DSpace a decent rule of thumb is to give tomcat half of the memory on your machine. At a minimum, you should give Tomcat >= 512MB of memory for optimal DSpace operation. (NOTE: As your DSpace instance gets larger in size, you may need to increase this number to the several GB range.) The latest guidance is to also set -Xms
to the same value as -Xmx
for server applications such as Tomcat.
Give Tomcat More Java PermGen Memory
Java PermGen Memory Recommendations
At the time of writing, DSpace recommends you should give Tomcat >= 128MB of PermGen Space to ensure optimal DSpace operation.
If you are seeing "java.lang.OutOfMemoryError: PermGen space
" errors, this is a sure sign that Tomcat is running out PermGen Memory. (More info on PermGen Space: http://blogs.sun.com/fkieviet/entry/classloader_leaks_the_dreaded_java)
To increase the amount of PermGen memory available to Tomcat (default=64MB), use either the JAVA_OPTS
or CATALINA_OPTS
environment variable, e.g:
CATALINA_OPTS=-XX:MaxPermSize=128m
OR
JAVA_OPTS=-XX:MaxPermSize=128m
The above example sets the maximum PermGen memory to 128MB.
Difference between JAVA_OPTS and CATALINA_OPTS
You can use either environment variable. JAVA_OPTS
is also used by other Java programs (besides just Tomcat). CATALINA_OPTS
is only used by Tomcat. So, if you only want to tweak the memory available to Tomcat, it is recommended that you use CATALINA_OPTS
. If you set both CATALINA_OPTS
and JAVA_OPTS
, Tomcat will default to using the settings in CATALINA_OPTS
.
Please note that you can obviously set both Tomcat's Heap space and PermGen Space together similar to:CATALINA_OPTS=-Xmx512m -Xms512m -XX:MaxPermSize=128m
On an Ubuntu machine (10.04) at least, the file /etc/default/tomcat6
appears to be the best place to put these environmental variables.
Choosing the size of memory spaces allocated to DSpace
psi-probe is a webapp that can be deployed in DSpace and be used to watch memory usage of the other webapps deployed in the same instance of Tomcat (in our case, the DSpace webapps).
- Download the latest version of psi-probe from https://github.com/psi-probe/psi-probe
Unzip probe.war into [dspace]/webapps/
cd [dspace]/webapps/ unzip ~/probe-3.1.0.zip unzip probe.war -d probe
Add a Context element in Tomcat's configuration, and make it privileged (so that it can monitor the other webapps):
EITHER in$CATALINA_HOME/conf/server.xml
<Context docBase="[dspace]/webapps/probe" privileged="true" path="/probe" />
OR in
$CATALINA_HOME
/conf/Catalina/localhost/probe.xml
<Context docBase="[dspace]/webapps/probe" privileged="true" />
Edit
$CATALINA_HOME/conf/tomcat-users.xml
to add a user for logging into psi-probe (see more in https://github.com/psi-probe/psi-probe/wiki/InstallationApacheTomcat)<?xml version='1.0' encoding='utf-8'?> <tomcat-users> <user username="admin" password="t0psecret" roles="manager" /> </tomcat-users>
- Restart Tomcat
- Open http://yourdspace.com:8080/probe/ (edit domain and port number as necessary) in your browser and use the username and password from tomcat-users.xml to log in.
In the "System Information" tab, go to the "Memory utilization" menu. Note how much memory Tomcat is using upon startup and use a slightly higher value than that for the -Xms
parameter (initial Java heap size). Watch how big the various memory spaces get over time (hours or days), as you run various common DSpace tasks that put load on memory, including indexing, reindexing, importing items into the oai index etc. These maximum values will determine the -Xmx
parameter (maximum Java heap size). Watching PS Perm Gen grow over time will let you choose the value for the -XX:MaxPermSize
parameter.
Give the Command Line Tools More Memory
Give the Command Line Tools More Java Heap Memory
Similar to Tomcat, you may also need to give the DSpace Java-based command-line tools more Java Heap memory. If you are seeing "java.lang.OutOfMemoryError: Java heap space
" errors, when running a command-line tool, this is a sure sign that it isn't being provided with enough Heap Memory.
By default, DSpace only provides 256MB of maximum heap memory to its command-line tools.
If you'd like to provide more memory to command-line tools, you can do so via the JAVA_OPTS
environment variable (which is used by the [dspace]/bin/dspace
script). Again, it's the same syntax as above:
JAVA_OPTS=-Xmx512m -Xms512m
This is especially useful for big batch jobs, which may require additional memory.
You can also edit the [dspace]/bin/dspace
script and add the environmental variables to the script directly.
Give the Command Line Tools More Java PermGen Space Memory
Similar to Tomcat, you may also need to give the DSpace Java-based command-line tools more PermGen Space. If you are seeing "java.lang.OutOfMemoryError: PermGen space
" errors, when running a command-line tool, this is a sure sign that it isn't being provided with enough PermGen Space.
By default, Java only provides 64MB of maximum PermGen space.
If you'd like to provide more PermGen Space to command-line tools, you can do so via the JAVA_OPTS
environment variable (which is used by the [dspace]/bin/dspace
script). Again, it's the same syntax as above:
JAVA_OPTS=-XX:MaxPermSize=128m
This is especially useful for big batch jobs, which may require additional memory.
Please note that you can obviously set both Java's Heap space and PermGen Space together similar to:JAVA_OPTS=-Xmx512m -Xms512m -XX:MaxPermSize=128m
Give PostgreSQL Database More Memory
On many linux distros PostgreSQL comes out of the box with an incredibly conservative configuration - it uses only 8Mb of memory! To put some more fire in its belly edit the shared_buffers
parameter in postgresql.conf
. The memory usage is 8KB multiplied by this value. The advice in the Postgres docs is not to increase it above 1/3 of the memory on your machine.
For More PostgreSQL Tips
For more hints/tips with PostgreSQL configurations and performance tuning, see also:
2 Comments
Drew Heles
The link to the atmire article on SOLR statistics performance tuning is dead, and I cannot find a copy elsewhere. A related resource exists at https://atmire.com/presentations/stats-webinar/Webinar-BEN-v0.9.pdf, but I don't think that is what is being described here. Does anyone know of a good replacement?
Bram Luyten (Atmire)
The content of that article is highly outdated, so I will remove the reference from the page. But in the unlikely event that you are still working with DSpace 1.6 or older, here are the contents of the original article.
==> (warning commercial message:) If you have problems with SOLR performance on newer versions of DSpace, me and my Atmire colleagues would be happy to help out.
Increasing DSpace Performance
Submitted by Bram Luyten on Wed, 2011-12-21 17:08 => HIGHLY OUTDATED
High load DSpace installations are installations that generate large amounts of usage data. This means that there are many visits of the different item, collection and community pages and many downloads of bitstreams.
High load installations with large amounts of usage data will result in slower statistics (solr) query execution which will cause the response time of the different statistics pages to increase or even time out. A second issue caused by high load installation is that there is a high number of usage events to write to the solr index. Because every usage event is committed separately, this will result in a large number of commits and each commit will update all of the indices in solr with the new documents, which causes the end-user to be forced to wait for the page to load until this update has been completed. In order to address these issues, @mire has created a number of solr server optimization techniques by using two features offered by the solr server:
The autocommit feature (DSpace 1.6)
The solr server autocommit feature allows the optimization of the storage of the different usage events. The out of the box installation of DSpace will use synchronous commits of the usage events. This means that every usage event will be committed resulting in a large number of commits. The autocommit feature of the solr server enables asynchronous commits of the usage events thereby grouping them into larger groups of events that will be committed at the same time. This technique will reduce the required number of commits and will decrease the load on the solr server. In order to configure the autocommit feature, the following customizations must be made:
<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>900000</maxTime>
</autoCommit>
Manually enabling SOLR autocommit like this only applies to DSpace 1.6 as this is already enabled by default in more recent releases.
The query warmup system
If the above solution is still insufficient for a quick response of your solr statistics, a more advanced and complicated solution can be applied to optimize your statistics. This optimization takes advantage of in-memory caches of important parts of your data. This implies large amounts of memory are required to apply this solution. The solr server warmup system is used to optimize the solr query execution time. This system is based on the fact that queries are cached by the Solr server based on the executed filter queries. Filter queries that are often executed are kept in the Solr cache for a longer period of time. Therefore, an automatic execution of a number of important queries will make it possible to keep certain queries warmed up which will result in fast execution times for these queries. Queries to the Solr server contain two types of query parts: queries and filter queries. The filter queries in the statistics solution always contain the constraints regarding the date range, the query field (country, owning collection, ...). The results of these filter queries are being cached by the Solr server. The query parts of a Solr query contain other constraints such as the item identifier used to select the proper results from the filter query results. The results of these query parts are not cached by the Solr server. Therefore, it is sufficient to warmup queries for one item identifier in order to warmup the queries for all the others. Queries should be warmed up in two different situations:
Since the out of the box DSpace statistics display results for a timespan across the last 6 months, the warmup queries at server startup should be based on the current month. At the end of each month, it is important to warmup the queries for the next month to avoid response time issues during the first use of the statistics at the beginning of the new month. At server startup, the Solr firstSearcher has to be enabled in the solrconfig.xml. The firstSearcher will contain a number of queries, that will be cached during Solr server startup. Because the queries in the firstSearcher should be adapted to the current month, @mire has created a solrconfig.template.xml file that contains the queries without the dates and a solr-warmup script that replaces the current solconfig.xml with a new version with the proper dates filled out. @mire recommends to execute this script just before tomcat startup. The script may need to be updated to include the correct path to your solr config. At the end of each month, the statistics for the next month must be cached as well. Therefore, @mire has created a second script for this purpose that must be executed at the end of each month.
@mire advises to schedule the execution of this script in crontab the last few hours of the last day of the month. It can be executed multiple times to ensure everything has been cached prior to the end of the month. The script may need to be updated to include the correct path to your solr config.