Page History
DSpace 1.6 and newer versions uses the Apache SOLR application underlying the statistics. SOLR enables performant searching and adding to vast amounts of (usage) data.
Unlike previous versions, enabling statistics in DSpace does not require additional installation or customization. All the necessary software is included.
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
What is exactly being logged ?
...
Definition of which fields are to be stored happens in the file dspace/solr/statistics/conf/schema.xml.
The fields, stored in a usage event by default are:
Code Block |
---|
<field name="type" type="integer" indexed="true" stored="true" required="true" />
<field name="id" type="integer" indexed="true" stored="true" required="true" />
<field name="ip" type="string" indexed="true" stored="true" required="false" />
<field name="time" type="date" indexed="true" stored="true" required="true" />
<field name="epersonid" type="integer" indexed="true" stored="true" required="false" />
<field name="continent" type="string" indexed="true" stored="true" required="false"/>
<field name="country" type="string" indexed="true" stored="true" required="false"/>
<field name="countryCode" type="string" indexed="true" stored="true" required="false"/>
<field name="city" type="string" indexed="true" stored="true" required="false"/>
<field name="longitude" type="float" indexed="true" stored="true" required="false"/>
<field name="latitude" type="float" indexed="true" stored="true" required="false"/>
<field name="owningComm" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="owningColl" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="owningItem" type="integer" indexed="true" stored="true" required="false" multiValued="true"/>
<field name="dns" type="string" indexed="true" stored="true" required="false"/>
<field name="userAgent" type="string" indexed="true" stored="true" required="false"/>
<field name="isBot" type="boolean" indexed="true" stored="true" required="false"/>
<field name="bundleName" type="string" indexed="true" stored="true" required="false" multiValued="true" />
|
...
Property: | server | ||
Example Value: | server = http://127.0.0.1/solr/statistics | ||
Informational Note: | Is used by the SolrLogger Client class to connect to the Solr server over http and perform updates and queries. In most cases, this can (and should) be set to localhost (or 127.0.0.1).
Assuming you get an HTTP 200 OK response, then you should set | ||
Property: | spiderips.urls | ||
Example Value: | spiderips.urls =
| ||
Informational Note: | List of URLs to download spiders files into [dspace]/config/spiders. These files contain lists of known spider IPs and are utilized by the SolrLogger to flag usage events with an "isBot" field, or ignore them entirely.
from your [dspace]/bin directory | ||
Property: | dbfile | ||
Example Value: | dbfile = ${dspace.dir}/config/GeoLiteCity.dat | ||
Informational Note: | The following referes to the GeoLiteCity database file utilized by the LocationUtils to calculate the location of client requests based on IP address. During the Ant build process (both fresh_install and update) this file will be downloaded from http://www.maxmind.com/app/geolitecity if a new version has been published or it is absent from your [dspace]/config directory. | ||
Property: | resolver.timeout | ||
Example Value: | resolver.timeout = 200 | ||
Informational Note: | Timeout in milliseconds for DNS resolution of origin hosts/IPs. Setting this value too high may result in solr exhausting your connection pool. | ||
Property: | useProxies | ||
Example Value: | useProxies = true | ||
Informational Note: | Will cause Statistics logging to look for X-Forward URI to detect clients IP that have accessed it through a Proxy service (e.g. the Apache mod_proxy). Allows detection of client IP when accessing DSpace. [Note: This setting is found in the DSpace Logging section of dspace.cfg] | ||
Property: | statistics.item.authorization.admin | ||
Example Value: | statistics.item.authorization.admin = true | ||
Informational Note: | When set to true, only general administrators, collection and community administrators are able to access the statistics from the web user interface. As a result, the links to access statistics are hidden for non logged-in admin users. Setting this property to "false" will display the links to access statistics to anyone, making them publicly available. | ||
Property: | solr.statistics.logBots | ||
Example Value: | solr.statistics.logBots = true | ||
Informational Note: | When this property is set to false, and IP is detected as a spider, the event is not logged. | ||
Property: | solr.statistics.query.filter.spiderIp | ||
Example Value: | solr.statistics.query.filter.spiderIp = false | ||
Informational Note: | If true, statistics queries will filter out spider IPs -- use with caution, as this often results in extremely long query strings. | ||
Property: | solr.statistics.query.filter.isBot | ||
Example Value: | solr.statistics.query.filter.isBot = true | ||
Informational Note: | If true, statistics queries will filter out events flagged with the "isBot" field. This is the recommended method of filtering spiders from statistics. | ||
Property: | query.filter.bundles | ||
Example | query.filter.bundles=ORIGINAL | ||
Informational | A comma seperated list that contains the bundles for which the file statistics will be displayed. |
Upgrade Process for Statistics
...
First approach the traditional DSpace build process for updating
Code Block |
---|
cd [dspace-source]/dspace
mvn package
cd [dspace-source]/dspace/target/dspace-<version>-build.dir
ant -Dconfig=[dspace]/config/dspace.cfg update
cp -R [dspace]/webapps/* [TOMCAT]/webapps
|
The last step is only used if you do not follow the recommended practice of configuring [dspace]/webapps as location for webapps in your servlet container (Tomcat, Resin or Jetty). If you only need to build the statistics, and don't make any changes to other web applications, you can replace the copy step above with:
Code Block |
---|
cp -R dspace/webapps/solr TOMCAT/webapps
|
...
The following Dspace.cfg fields are only applicable to the older statistics solution.
Code Block |
---|
###### Statistical Report Configuration Settings ######
# should the stats be publicly available? should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = false
# directory where live reports are stored
report.dir = ${dspace.dir}/reports/
|
...
When a backup has been made start the Tomcat/Jetty/Resin server program.
The update script has one optional command which will if given not only update the broken file statistics but also delete file statistics for files that where removed from the system (if this option isn't active these statistics will receive the "BITSTREAM_DELETED" bundle name).
Code Block |
---|
#The -r is optional
[dspace]/bin/dspace stats-util -b -r
|
...
If required, the solr server can be optimized by running
Code Block |
---|
{dspace.dir}/bin/stats-util -o
|
...
In DSpace 1.6.x, each solr event was committed to the solr server individually. For high load DSpace installations, this would result in a huge load of small solr commits resulting in a very high load on the solr server.
This has been resolved in dspace 1.7 by only committing usage events to the solr server every 15 minutes. This will result in a delay of the storage of a usage event of maximum 15 minutes. If required, this value can be altered by changing the maxTime property in the
Code Block |
---|
{dspace.dir}/solr/statistics/conf/solrconfig.xml
|
...
Modify line 178 in the StatisticsTransformer.java file
https://github.com/DSpace/DSpace/blob/dspace-1_8_x/dspace-xmlui/dspace-xmlui-api/src/main/java/org/dspace/app/xmlui/aspect/statistics/StatisticsTransformer.javajava#L178
-6 is the default setting, displaying the past 6 months of statistics. When reducing this to a smaller natural number, less months are being displayed.
...
Top downloaded items by a specific user
Query:
Code Block |
---|
http://localhost:8080/solr/statistics/select?indent=on&version=2.2&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=&facet=true&facet.field=epersonid&q=type:0
|
...
facet.field=epersonid — You want to group by epersonid, which is the user id.
type:0 — Interested in bitstreams only
Code Block |
---|
<lst name="facet_counts">
<lst name="facet_fields">
<lst name="epersonid">
<int name="66">1167</int>
<int name="117">251</int>
<int name="52">42</int>
<int name="19">36</int>
<int name="88">20</int>
<int name="112">18</int>
<int name="110">9</int>
<int name="96">0</int>
</lst>
</lst>
</lst>
|
...
You have two options to install/update this file:
Attempt to re-run the automatic installer from your DSpace Source Directory ([dspace-source]). This will attempt to automatically download the database file, unzip it and install it into the proper location:
Code Block ant update_geolite
- NOTE: If the location of the GeoLite Database file is known to have changed, you can also run this auto-installer by passing it the new URL of the GeoLite Database File:
ant -Dgeolite=[full-URL-of-geolite] update_geolite
- NOTE: If the location of the GeoLite Database file is known to have changed, you can also run this auto-installer by passing it the new URL of the GeoLite Database File:
- OR, you can manually install the file by performing these steps yourself:
- First, download the latest GeoLite Database file from http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
- Next, unzip that file to create a file named GeoLiteCity.dat
- Finally, move or copy that file to your DSpace installation, so that it is located at
[dspace]/config/GeoLiteCity.dat
.