Old Release

This documentation relates to an old version of DSpace, version 3.x. Looking for another version? See all documentation.

This DSpace release is end-of-life and is no longer supported.

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

DSpace Log Converter

With the release of DSpace 1.6, new statistics software component was added. DSpace's use of SOLR for statistics makes it possible to have a database of statistics. This in mind, there is the issue of the older log files and how a site can use them. The following command process is able to convert the existing log files and then import them for SOLR use. The user will need to perform this only once.

The Log Converter program converts log files from dspace.log into an intermediate format that can be inserted into SOLR.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="f5fda420-7290-450c-8c37-e3900423cf44"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-log-converter


Java class:


Arguments short and long forms):


-i or -in

Input file

-o or -out

Output file

-m or -multiple

Adds a wildcard at the end of input and output, so it would mean dspace.log* would be converted. (For example, the following files would be included because of this argument: dspace.log, dspace.log.1, dspace.log.2, dspace.log.3, etc.)

-n or -newformat

If the log files have been created with DSpace 1.6

-v or -verbose

Display verbose output (helpful for debugging)

-h or -help


The command loads the intermediate log files that have been created by the aforementioned script into SOLR.

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="1398ff17-30f3-47a3-badc-86a75441bc9c"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-log-importer


Java class:


Arguments (short and long forms):


-i or --

input file

-m or --

Adds a wildcard at the end of the input, so it would mean dspace.log* would be imported

-s or --

To skip the reverse DNS lookups that work out where a user is from. (The DNS lookup finds the information about the host from its IP address, such as geographical location, etc. This can be slow, and wouldn't work on a server not connected to the internet.)

-v or --

Display verbose ouput (helpful for debugging)

-l or --

For developers: allows you to import a log file from another system, so because the handles won't exist, it looks up random items in your local system to add hits to instead.

-h or --


Although the DSpace Log Convertor applies basic spider filtering (googlebot, yahoo slurp, msnbot), it is far from complete. Please refer to Filtering and Pruning Spiders for spider removal operations, after converting your old logs.

Filtering and Pruning Spiders

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="b8a539b3-9d2f-423b-8d34-045f41cbf12d"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-util


Java class:


Arguments (short and long forms):


-u or -update-spider-files

Update Spider IP Files from internet into /dspace/config/spiders. Downloads Spider files identified in dspace.cfg under property solr.spiderips.urls. See DSpace SOLR Statistics Configuration

-f or -delete-spiders-by-flag

Delete Spiders in Solr By isBot Flag. Will prune out all records that have isBot:true

-i or -delete-spiders-by-ip

Delete Spiders in Solr By IP Address. Will prune out all records that have IP's that match spider IPs.

-m or -mark-spiders

Update isBog Flag in Solr. Marks any records currently stored in statistics that have IP addresses matched in spiders files

-h or -help

Calls up this brief help table at command line.


The usage of these options is open for the user to choose, If they want to keep spider entires in their repository, they can just mark them using "-m" and they will be excluded from statistics queries when "solr.statistics.query.filter.isBot = true" in the dspace.cfg.

If they want to keep the spiders out of the solr repository, they can run just use the "-i" option and they will be removed immediately.

There are guards in place to control what can be defined as an IP range for a bot, in [dspace]/config/spiders, spider IP address ranges have to be at least 3 subnet sections in length 123.123.123 and IP Ranges can only be on the smallest subnet [ -]. If not, loading that row will cause exceptions in the dspace logs and exclude that IP entry.

Routine SOLR Index Maintenance

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="5e740484-b535-4bc5-ac17-3236f3217fef"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-util


Java class:


Arguments (short and long forms):


-o or -optimize

Run maintenance on the SOLR index. Recommended to run daily, to prevent your servlet container from running out of memory


The usage of this this option is strongly recommended, you should run this script daily (from crontab or your system's scheduler), to prevent your servlet container from running out of memory.

Yearly solr sharding

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="3032decd-393d-47e8-829b-0e3e3c9db641"><ac:plain-text-body><![CDATA[

Command used:

[dspace]/bin/dspace stats-util


Java class:


Arguments (short and long forms):


-s or shard-solr-index

Splits the data in the main core up into a separate solr core for each year, this will upgrade the performance of the solr.


The usage of this this option is strongly recommended, you should run this script once a year at the start of a new year (from crontab or your system's scheduler), doing this will ensure that the solr performance stays solid.

This script will split up the current solr data core into one core for each year. Data cores are located in the [dspace.dir]/solr directory, these cores will NOT be defined in the solr.xml since they will be loaded on run time. The loading on run time is handled by the static method located in the org.dspace.statistics.SolrLogger class. These cores are stored in the statisticYearCores list each time a query is made to the solr these cores are added as shards by the addAdditionalSolrYearCores method. The cores share the configuration of the original statistics core so updating by using ant isn't an issue. The actual sharding of the of the original solr core into the year cores is done in the shardSolrIndex method in the org.dspace.statistics.SolrLogger class, the sharding is done by first running a facet on the time to get the facets split by year. Once we have our years from our logs we query the main solr data server for all information on each year & download these as csv's. When we have all data for one year we upload it to the newly created core of that year by using the update csv handler. One all data of one year has been uploaded that data is removed from the main solr (by doing it this way if our solr crashes we do not need to start from scratch).

  • No labels