Page History
DSpace 1.6 and newer versions uses the Apache SOLR application underlying the statistics. SOLR enables performant searching and adding to vast amounts of (usage) data.
Unlike previous versions, enabling statistics in DSpace does not require additional installation or customization. All the necessary software is included.
...
What is exactly being logged ?
...
After the introduction of the SOLR Statistics logging in DSpace 1.6, every pageview and file download is logged in a dedicated SOLR statistics core.
DSpace 3.0 and newer
In addition to the already existing logging of pageviews and downloads, DSpace 3.0 now also logs search queries users enter in the DSpace search dialog and workflow events.
Warning | |||
---|---|---|---|
| |||
In DSpace 7.x & 8.x, only usage statistics (pageview, downloads) are logged. Search statistics and workflow reports (which were available in 6.x and below) are not yet supported. See their related tickets: https://github.com/DSpace/DSpace/issues/2880 and https://github.com/DSpace/DSpace/issues/2851Due to the very recent addition of Discovery for search & faceted browsing in JSPUI, these search queries are not yet logged. Regular (non-discovery) search queries are being logged in JSP UI. |
Warning | ||
---|---|---|
| ||
Only workflow events, initiated and executed by a physical user are being logged. Automated workflow steps or ingest procedures are currently not being logged by the workflow events logger. |
...
Code Block | ||
---|---|---|
| ||
<field name="workflowStep" type="string" indexed="true" stored="true" required="false" multiValued="true"/> <field name="previousWorkflowStep" type="string" indexed="true" stored="true" required="false" multiValued="true"/> <field name="owner" type="string" indexed="true" stored="true" required="false" multiValued="true"/> <field name="submitter" type="integer" indexed="true" stored="true" required="false" /> <field name="actor" type="integer" indexed="true" stored="true" required="false" /> <field name="workflowItemId" type="integer" indexed="true" stored="true" required="false" /> |
Web User Interface Elements
Pageview and Download statistics
In the XMLUI, pageview and download statistics can be accessed from the lower end of the navigation menu. In the JSPUI, a view statistics button appears on the bottom of pages for which statistics are available.
If you are not seeing these links or buttons, it's likely that they are only enabled for administrators in your installation. Change the configuration parameter "authorization.admin.usage" in usage-statistics.cfg to false in order to make statistics visible for all repository visitors.
Home page
Starting from the repository homepage, the statistics page displays the top 10 most popular items of the entire repository.
Community home page
The following statistics are available for the community home pages:
Disabling Tracking of Statistics
By default, Statistics are captured by the REST API for all visits (page hits) and downloads that occur via both the User Interface and the REST API.
Disabling statistical tracking currently must be done by modifying the backend's Spring configuration in [dspace]/config/spring/rest/event-service-listeners.xml. In that file, you must comment out the "SolrLoggerUsageEventListener" in order to disable all tracking
Code Block | ||||
---|---|---|---|---|
| ||||
<beans>
...
<!-- Comment out this bean, as shown below, to disable all tracking of usage statistics in Solr -->
<!-- Inject the SolrLoggerUsageEventListener into the EventService -->
<!--
<bean class="org.dspace.statistics.SolrLoggerUsageEventListener">
<property name="eventService" ref="org.dspace.services.EventService"/>
</bean>
-->
</beans> |
After commenting out that bean, you will need to restart Tomcat.
NOTE: This only disables tracking statistics in Solr. The "Statistics" link will still appear in the header menu of the User Interface. However, you can limit its visibility by setting it to only be visible to administrative users. Update this configuration in your local.cfg or user-statistics.cfg:
Code Block |
---|
# Limit access to Admins, Community/Collection Admins
usage-statistics.authorization.admin.usage = true |
Note | ||
---|---|---|
| ||
At this time, there is no flag to remove the "Statistics" menu link completely. See https://github.com/DSpace/DSpace/issues/9698 |
Web User Interface Elements
Pageview and Download statistics
In the UI, pageview and download statistics can be accessed from the "Statistics" navigation menu near the header. That statistics page is "context aware", so it will show the usage statistics for whatever page (site, Community, Collection) you are currently on.
If you are not seeing the menu, it's likely that they are only enabled for administrators in your installation. Change the configuration parameter "authorization.admin.usage" in usage-statistics.cfg to false in order to make statistics visible for all repository visitors.
Home page
Starting from the repository homepage, the statistics page displays the top 10 most popular items of the entire repository.
Community home page
The following statistics are available for the community home pages:
- Total visits of the current community home page
- Visits of the community home page over a timespan of the last 7 months
- Top 10 country from where the
- Total visits of the current community home page
- Visits of the community home page over a timespan of the last 7 months
- Top 10 country from where the visits originate
- Top 10 cities from where the visits originate
...
- Total visits of the item
- Total visits for the bitstreams attached to the item
- Visits of the item over a timespan of the last 7 months
- Top 10 country views from where the visits originate
- Top 10 cities from where the visits originate
Search Query Statistics
- visits originate
Search Query Statistics
Warning | ||
---|---|---|
| ||
Search query statistics are only supported in DSpace 6.x and below at this time. The below screenshots and instructions are for 6.x and will need updating if this feature is ported to later versions of DSpace. See https://github.com/DSpace/DSpace/issues/2880 |
In the UI, search query statistics In the XMLUI, search query statistics can be accessed from the lower end of the navigation menu.
...
If you are using Discovery, note that clicking the facets also counts as a search, because clicking a facet sends a search query to the Discovery index.
Workflow Event Statistics
sends a search query to the Discovery index.
Workflow Event Statistics
Warning | ||
---|---|---|
| ||
Workflow Event statistics are only supported in DSpace 6.x and below at this time. The below screenshots and instructions are for 6.x and will need updating if this feature is ported to later versions of DSpace. See https://github.com/DSpace/DSpace/issues/2851 |
In the UI, search query statistics In the XMLUI, search query statistics can be accessed from the lower end of the navigation menu.
...
The DSpace Statistics Implementation is a Client/Server architecture based on Solr for collecting usage events in the JSPUI and XMLUI user interface User Interface or REST API applications of DSpace. Solr runs as a separate webapplication and an instance of Apache Http Client is utilized to allow parallel requests to log statistics events into this Solr instancemust be installed separately from DSpace.
Configuration settings for Statistics
...
Property: | solr-statistics.server | |||||||
Example Values: | solr-statistics.server = http://127.0.0.1/solr/statistics | |||||||
Informational Note: | Is used by the SolrLogger Client class to connect to the Solr server over http and perform updates and queries. In most cases, this can (and should) be set to localhost (or 127.0.0.1).
Assuming you get an HTTP 200 OK response, then you should set | |||||||
Property: | solr-statistics.query.filter.bundles | |||||||
Example | solr-statistics.query.filter.bundles=ORIGINAL | |||||||
Informational | A comma seperated list that contains the bundles for which the file statistics will be displayed. | |||||||
Property: | solr-statistics.query.filter.spiderIp | |||||||
Example Value: | solr-statistics.query.filter.spiderIp = false | |||||||
Informational Note: | If true, statistics queries will filter out spider IPs -- use with caution, as this often results in extremely long query strings.you should set | |||||||
Property: | solr-statistics.query.filter.isBotbundles | |||||||
Example | solr-statistics.query.filter.isBot bundles= trueORIGINAL | |||||||
Informational | If true, statistics queries will filter out events flagged with the "isBot" field. This is the recommended method of filtering spiders from statisticsA comma seperated list that contains the bundles for which the file statistics will be displayed. | |||||||
Property: | solr-statistics.query. | spideripsfilter. | urlsspiderIp | |||||
Example Value: | solr-statistics. | spideripsquery.filter. | urls spiderIp = false |
Code Block |
---|
http://iplists.com/google.txt, \
http://iplists.com/inktomi.txt, \
http://iplists.com/lycos.txt, \
http://iplists.com/infoseek.txt, \
http://iplists.com/altavista.txt, \
http://iplists.com/excite.txt, \
http://iplists.com/misc.txt
|
Informational Note:
The "stats-util" command can be used to force an update of spider files, regenerate "isBot" fields on indexed events, and delete spiders from the index. For usage, run:
Code Block |
---|
dspace stats-util -h
|
from your [dspace]/bin directory
In the {dspace.dir}/config/modules/usage-statistics.cfg
file review the following fields. These fields can be edited in place, or overridden in your own local.cfg config file (see Configuration Reference).
Informational Note: | If true, statistics queries will filter out spider IPs -- use with caution, as this often results in extremely long query strings. | ||
Property: | solr-statistics.query.filter.isBot | ||
Example Value: | solr-statistics.query.filter.isBot = true | ||
Informational Note: | If true, statistics queries will filter out events flagged with the "isBot" field. This is the recommended method of filtering spiders from statistics. | ||
Property: | solr-statistics.autoCommit | ||
Example Value: | solr-statistics.autoCommit = true | ||
Informational Note: | If true (default), then all view statistics will be committed to Solr whenever the next autoCommit is triggered. This is recommended behavior. If false, then view statistics will be committed to Solr immediately (i.e. via an explicit commit call). This setting is untested in Production scenarios, and is primarily used by automated integration tests (to verify that the statistics engine is working properly). | ||
Property: | solr-statistics.spiderips.urls | ||
Example Value: | solr-statistics.spiderips.urls =
| ||
Informational Note: | List of URLs to download spiders files into [dspace]/config/spiders. These files contain lists of known spider IPs and are utilized by the SolrLogger to flag usage events with an "isBot" field, or ignore them entirely.
from your [dspace]/bin directory |
In the {dspace.dir}/config/modules/usage-statistics.cfg
file review the following fields. These fields can be edited in place, or overridden in your own local.cfg config file (see Configuration Reference).
Property: | usage-statistics.dbfile | |||
Example Value: | usage-statistics.dbfile = ${dspace.dir}/config/GeoLite2-City.mmdb | |||
Informational Note: | References the location of the installed GeoLite or DB-IP City "mmdb" database file. This file is utilized by the LocationUtils to calculate the location of client requests based on IP address. | |||
Property: | usage-statistics.resolver.timeout | |||
Example Value: | usage-statistics.resolver.timeout = 200 | |||
Informational Note: | Timeout in milliseconds for DNS resolution of origin hosts/IPs. Setting this value too high may result in solr exhausting your connection pool. | |||
Property: | useProxies (Set in dspace.cfg) | |||
Example Value: | useProxies = true | |||
Informational Note: | Will cause Statistics logging to look for X-Forward URI to detect clients IP that have accessed it through a Proxy service (e.g. the Apache mod_proxy). Allows detection of client IP when accessing DSpace. [Note: This setting is found in the DSpace Logging section of dspace.cfg] | |||
Property: | usage-statistics.dbfile | |||
Example Value: | usage-statistics.dbfile = ${dspace.dir}/config/GeoLiteCity.dat | |||
Informational Note: | The following referes to the GeoLiteCity database file utilized by the LocationUtils to calculate the location of client requests based on IP address. During the Ant build process (both fresh_install and update) this file will be downloaded from http://www.maxmind.com/app/geolitecity if a new version has been published or it is absent from your [dspace]/config directory. | |||
Property: | usage-statistics.resolver.timeout | |||
Example Value: | usage-statistics.resolver.timeout = 200 | |||
Informational Note: | Timeout in milliseconds for DNS resolution of origin hosts/IPs. Setting this value too high may result in solr exhausting your connection pool. | |||
Property: | useProxies (Set in dspace.cfg) | |||
Example Value: | useProxies = true | |||
Informational Note: | Will cause Statistics logging to look for X-Forward URI to detect clients IP that have accessed it through a Proxy service (e.g. the Apache mod_proxy). Allows detection of client IP when accessing DSpace. [Note: This setting is found in the DSpace Logging section of dspace.cfg] | |||
Property: | usage-statistics.authorization.admin.usage | |||
Example Value: | usage-statistics.authorization.admin.usage = true | |||
Informational Note: | When set to true, only general administrators, collection and community administrators are able to access the pageview and download statistics from the web user interface. As a result, the links to access statistics are hidden for non logged-in admin users. Setting this property to "false" will display the links to access statistics to anyone, making them publicly available. | |||
Property: | usage-statistics.authorization.admin.search | |||
Example Value: | usage-statistics.authorization.admin.search = true | |||
Informational Note: | When set to true, only system, collection or community administrators are able to access statistics on search queries. ||||
Property: | usage-statistics.authorization.admin. | workflowusage | ||
Example Value: | usage-statistics.authorization.admin. | workflow usage = true | ||
Informational Note: | When When set to true, only | systemgeneral administrators, collection | or and community administrators are able | to access statistics on workflow eventsto access the pageview and download statistics from the web user interface. As a result, the links to access statistics are hidden for non logged-in admin users. Setting this property to "false" will display the links to access statistics to anyone, making them publicly available. |
Property: | usage-statistics.authorization. | logBotsadmin.search | ||
Example Value: | usage-statistics. | logBots authorization.admin.search = true | ||
Informational Note: | When | this property isset to | false, and IP is detected as a spider, the event is not logged.
Pre-1.6 Statistics settings
Older versions of DSpace featured static reports generated from the log files. They still persist in DSpace today but are completely independent from the SOLR based statistics.
The following configuration parameters applicable to these reports can be found in dspace.cfg.
Code Block |
---|
###### Statistical Report Configuration Settings ######
# should the stats be publicly available? should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = false
# directory where live reports are stored
report.dir = ${dspace.dir}/reports/
|
These fields are not used by the new 1.6 Statistics, but are only related to the Statistics from previous DSpace releases
Upgrade Process for Statistics
Example of rebuild and redeploy DSpace (only if you have configured your distribution in this manner)
First approach the traditional DSpace build process for updating
Code Block |
---|
cd [dspace-source]/dspace
mvn package
cd [dspace-source]/dspace/target/dspace-installer
ant -Dconfig=[dspace]/config/dspace.cfg update
cp -R [dspace]/webapps/* [TOMCAT]/webapps
|
The last step is only used if you do not follow the recommended practice of configuring [dspace]/webapps as location for webapps in your servlet container (Tomcat, Resin or Jetty). If you only need to build the statistics, and don't make any changes to other web applications, you can replace the copy step above with:
Code Block |
---|
cp -R dspace/webapps/solr TOMCAT/webapps
|
Again, only if you are not mounting [dspace]/webapps directly into your Tomcat, Resin or Jetty host (the recommended practice)
Restart your webapps (Tomcat/Jetty/Resin)
Statistics Administration
Converting older DSpace logs into SOLR usage data
If you have upgraded from a previous version of DSpace, converting older log files ensures that you carry over older usage stats from before the upgrade.
Statistics Client Utility
The command line interface (CLI) scripts can be used to clean the usage database from additional spider traffic and other maintenance tasks. In DSpace 3.0, a script has been added to split up the monolithic SOLR core into individual cores each containing a year of statistics.
Statistics differences between DSpace 1.7.x and 1.8.0
Displayed file statistics bundle configurable
In DSpace 1.6.x & 1.7.x the file download statistics were generated without regard to the bundle in which the file was located. In DSpace 1.8.0 it is possible to configure the bundles for which the file statistics are to be shown by using the query.filter.bundles property. If required the old file statistics can also be upgraded to include the bundle name so that the old file statistics are fixed.
Warning | ||
---|---|---|
| ||
Applying this change will involve dumping all the old file statistics into a file and re uploading these. Therefore it is wise to create a backup of the {dspace.dir}/solr/statistics/data directory. It is best to create this backup when the Tomcat/Jetty/Resin server program isn't running. |
When a backup has been made start the Tomcat/Jetty/Resin server program.
The update script has one optional command which will if given not only update the broken file statistics but also delete file statistics for files that where removed from the system (if this option isn't active these statistics will receive the "BITSTREAM_DELETED" bundle name).
Code Block |
---|
#The -r is optional
[dspace]/bin/dspace stats-util -b -r
|
Statistics differences between DSpace 1.6.x and 1.7.0
SOLR optimization added
If required, the solr server can be optimized by running
Code Block |
---|
{dspace.dir}/bin/stats-util -o
|
More information on how these solr server optimizations work can be found here: http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations.
SOLR Autocommit
In DSpace 1.6.x, each solr event was committed to the solr server individually. For high load DSpace installations, this would result in a huge load of small solr commits resulting in a very high load on the solr server.
This has been resolved in dspace 1.7 by only committing usage events to the solr server every 15 minutes. This will result in a delay of the storage of a usage event of maximum 15 minutes. If required, this value can be altered by changing the maxTime property in the
Code Block |
---|
{dspace.dir}/solr/statistics/conf/solrconfig.xml
|
Web UI Statistics Modification (XMLUI Only)
Modifying the number of months, for which statistics are displayed
Modify line 205 in the StatisticsTransformer.java file
-6 is the default setting, displaying the past 6 months of statistics. When reducing this to a smaller natural number, less months are being displayed.
true, only system, collection or community administrators are able to access statistics on search queries. | |
Property: | usage-statistics.authorization.admin.workflow |
Example Value: | usage-statistics.authorization.admin.workflow = true |
Informational Note: | When set to true, only system, collection or community administrators are able to access statistics on workflow events. |
Property: | usage-statistics.logBots |
Example Value: | usage-statistics.logBots = true |
Informational Note: | When this property is set to false, and IP is detected as a spider, the event is not logged. |
Property: | usage-statistics.shardedByYear |
Example Value: | usage-statistics.shardedByYear = false |
Informational Note: | When set to "true", the DSpace statistics engine will look for additional Solr Shards (per year) when compiling all usage statistics. Therefore, if you are regularly running "stats-utils -s" (as documented in the "Solr Sharding By Year" section of the "SOLR Statistics Maintenance" page), then you should set this to "true". By default, it is "false", which tells the statistics engine to only compile usage statistics based on what is found in the current Solr core. |
Pre-1.6 Statistics settings
Warning | ||
---|---|---|
| ||
Search query statistics are only supported in DSpace 6.x and below at this time. See https://github.com/DSpace/DSpace/issues/2852 |
Older versions of DSpace featured static reports generated from the log files. They still persist in DSpace today but are completely independent from the SOLR based statistics.
The following configuration parameters applicable to these reports can be found in dspace.cfg.
Code Block |
---|
###### Statistical Report Configuration Settings ######
# should the stats be publicly available? should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = false
# directory where live reports are stored
report.dir = ${dspace.dir}/reports/
|
These fields are not used by the new 1.6 Statistics, but are only related to the Statistics from previous DSpace releases
Statistics Administration
Converting older DSpace logs into SOLR usage data
If you have upgraded from a previous version of DSpace, converting older log files ensures that you carry over older usage stats from before the upgrade.
Statistics Client Utility
The command line interface (CLI) scripts can be used to clean the usage database from additional spider traffic and other maintenance tasks. As of DSpace 3.0, a script has been added to split up the monolithic SOLR core into individual cores each containing a year of statistics.
Anonymizing Statistics
DSpace provides a commandline script (./dspace anonymize-statistics) which allows you to anonymize your statistics to better comply with GDPR and similar privacy regulations.
The script will anonymise the IP values by rewriting (‘masking’) the last part. This mask is configurable, both for ipv4 and ipv6 addresses.
- For IPv4 addresses, the last number will be replaced by the mask, defined by the configuration key ‘anonymise_statistics.ip_v4_mask’ which defaults to ‘254’.
For example, 109.74.16.171 is rewritten as 109.74.16.254 - For IPv6 address, the last two numbers will be replaced by the mask, defined by the configuration key ‘anonymise_statistics.ip_v6_mask’ which defaults to ‘FFFF:FFFF’. For example, 2001:0db8:85a3:0000:0000:8a2e:0370:7334 is rewritten as 2001:0db8:85a3:0000:0000:8a2e:FFFF:FFFF
For each anonymised record, the DNS field is also replaced by “anonymised”.
Script options available:
- The program only processes records older than 90 days. This period can be altered with the config ‘anonymise_statistics.time_limit’ (expressed in days) in usage-statistics.cfg.
- "-s [sleep]" : The script takes an optional parameter ‘-s [sleep]’ (expressed in ms), which will make the Java thread sleep between the calls to Solr to reduce the load impact.
- "-t [threads]" : The Solr service commit mechanism is also optimised by adding multi-threading support. The script takes an optional parameter ‘-t [threads]’ to indicate how many threads the Solr service can use for this, if not given the thread count defaults to 2.
Statistical records can also be anonymised the moment they are created. Enabling this feature can be done by setting the configuration parameter "anonymise_statistics.anonymise_on_log" to true in "usage-statististics.cfg" When this configuration property is not set, the feature is disabled by default.Related: DatasetTimeGenerator Javadoc
Custom Reporting - Querying SOLR Directly
...
Code Block |
---|
http://localhost:80808983/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0 |
...
- Either install a copy of MaxMind's GeoLite City database (in MMDB format)
- Installing MaxMind GeoLite2 is free. However, you must sign up for a (free) MaxMind account in order to obtain a license key to use the GeoLite2 database.
- You will need to arrange regular downloads of the GeoLite2 database. MaxMind offers an updater tool (geoipupdate) to do the downloading/updating, and a number of Linux distributions package it (as
geoipupdate
). You will still need to configure your license key prior to usage. Use it before restarting DSpace, to get an up-to-date database. - Once the "GeoLite2-City.mmdb" database file is installed on your system, you will need to configure its location as the value of
usage-statistics.dbfile
in yourlocal.cfg
configuration file. - NOTE: This file is frequently updated by MaxMind.com, so you will need to refresh it regularly (ideally by scheduling the updater tool via a cron job or similar). As this is written, the database is updated monthly, and to be allowed to obtain it you need to agree to keep your copy updated.
- Or, you can alternatively use/install DB-IP's City Lite database (in MMDB format)
- This database is also free to use, but does not require an account to download.
- You will need to arrange regular downloads of the City Lite database. DB-IP offers an updater tool to (dbip-update) to do the downloading/updating, but it requires PHP to run.
- Once the "dbip-city-lite.mmdb" database file is installed on your system, you will need to configure its location as the value of
usage-statistics.dbfile
in yourlocal.cfg
configuration file. - NOTE: This file is frequently updated by DB-IP.com, so you will need to refresh it regularly (ideally by scheduling the updater tool via a cron job or similar). As this is written, the database is updated monthly with the latest available at https://db-ip.com/db/download/ip-to-city-lite
...