Page History
...
DSpace
...
Statistics
...
DSpace
...
1.6
...
and
...
newer
...
versions
...
uses
...
the
...
Apache
...
SOLR
...
application
...
underlying
...
the
...
statistics.
...
SOLR
...
enables
...
performant
...
searching
...
and
...
adding
...
to
...
vast
...
amounts
...
of
...
(usage)
...
data.
...
Unlike
...
previous
...
versions,
...
enabling
...
statistics
...
in
...
DSpace
...
does
...
not
...
require
...
additional
...
installation
...
or
...
customization.
...
All
...
the
...
necessary
...
software
...
is
...
included.
...
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
What is exactly being logged ?
Each time a page or file gets requested, this request is being logged. The logging happens at the server side, and doesn't require a javascript like Google Analytics does, to provide usage data.
Definition of which fields are to be stored happens in the file dspace/solr/statistics/conf/schema.xml.
...
Some
...
example
...
fields,
...
that
...
can
...
be
...
stored
...
per
...
usage
...
event,
...
include:
Code Block |
---|
} <field name="type" type="integer" indexed="true" stored="true" required="true" /> <field name="id" type="integer" indexed="true" stored="true" required="true" /> <field name="ip" type="string" indexed="true" stored="true" required="false" /> <field name="time" type="date" indexed="true" stored="true" required="true" /> <field name="epersonid" type="integer" indexed="true" stored="true" required="false" /> <field name="country" type="string" indexed="true" stored="true" required="false" /> <field name="city" type="string" indexed="true" stored="true" required="false"/> <field name="owningComm" type="integer" indexed="true" stored="true" required="false" multiValued="true" /> {code} |
The
...
combination
...
of
...
...
and
...
id
...
determine
...
which
...
resource
...
(either
...
page
...
or
...
file
...
download)
...
has
...
been
...
requested.
...
Web
...
user
...
interface
...
for
...
DSpace
...
statistics
...
In
...
the
...
XMLUI,
...
statistics
...
can
...
be
...
accessed
...
from
...
the
...
lower
...
end
...
of
...
the
...
navigation
...
menu.
...
In
...
the
...
JSPUI,
...
a
...
view
...
statistics
...
button
...
appears
...
on
...
the
...
bottom
...
of
...
pages
...
for
...
which
...
statistics
...
are
...
available.
...
If
...
you
...
are
...
not
...
seeing
...
these
...
links
...
or
...
buttons,
...
it's
...
likely
...
that
...
they
...
are
...
only
...
enabled
...
for
...
administrators
...
in
...
your
...
installation.
...
Change
...
the
...
configuration
...
parameter
...
"statistics.item.authorization.admin"
...
to
...
false
...
in
...
order
...
to
...
make
...
statistics
...
visible
...
for
...
all
...
repository
...
visitors.
...
Home
...
page
...
Starting
...
from
...
the
...
repository
...
homepage,
...
the
...
statistics
...
page
...
displays
...
the
...
top
...
10
...
most
...
popular
...
items
...
of
...
the
...
entire
...
repository.
...
Community
...
home
...
page
...
The
...
following
...
statistics
...
are
...
available
...
for
...
the
...
community
...
home
...
pages:
...
- Total
...
- visits
...
- of
...
- the
...
- current
...
- community
...
- home
...
- page
...
- Visits
...
- of
...
- the
...
- community
...
- home
...
- page
...
- over
...
- a
...
- timespan
...
- of
...
- the
...
- last
...
- 7
...
- months
...
- Top
...
- 10
...
- country
...
- from
...
- where
...
- the
...
- visits
...
- originate
...
- Top
...
- 10
...
- cities
...
- from
...
- where
...
- the
...
- visits
...
- originate
...
Collection
...
home
...
page
...
The
...
following
...
statistics
...
are
...
available
...
for
...
the
...
collection
...
home
...
pages:
...
- Total
...
- visits
...
- of
...
- the
...
- current
...
- collection
...
- home
...
- page
...
- Visits
...
- of
...
- the
...
- collection
...
- home
...
- over
...
- a
...
- timespan
...
- of
...
- the
...
- last
...
- 7
...
- months
...
- Top
...
- 10
...
- country
...
- from
...
- where
...
- the
...
- visits
...
- originate
...
- Top
...
- 10
...
- cities
...
- from
...
- where
...
- the
...
- visits
...
- originate
...
Item
...
home
...
page
...
The
...
following
...
statistics
...
are
...
available
...
for
...
the
...
item
...
home
...
pages:
...
- Total
...
- visits
...
- of
...
- the
...
- item
...
- Total
...
- visits
...
- for
...
- the
...
- bitstreams
...
- attached
...
- to
...
- the
...
- item
...
- Visits
...
- of
...
- the
...
- item
...
- over
...
- a
...
- timespan
...
- of
...
- the
...
- last
...
- 7
...
- months
...
- Top
...
- 10
...
- country
...
- views
...
- from
...
- where
...
- the
...
- visits
...
- originate
...
- Top
...
- 10
...
- cities
...
- from
...
- where
...
- the
...
- visits
...
- originate
...
Usage
...
Event
...
Logging
...
and
...
Usage
...
Statistics
...
Gathering
...
The
...
DSpace
...
Statistics
...
Implementation
...
is
...
a
...
Client/Server
...
architecture
...
based
...
on
...
Solr
...
for
...
collecting
...
usage
...
events
...
in
...
the
...
JSPUI
...
and
...
XMLUI
...
user
...
interface
...
applications
...
of
...
DSpace.
...
Solr
...
runs
...
as
...
a
...
separate
...
webapplication
...
and
...
an
...
instance
...
of
...
Apache
...
Http
...
Client
...
is
...
utilized
...
to
...
allow
...
parallel
...
requests
...
to
...
log
...
statistics
...
events
...
into
...
this
...
Solr
...
instance.
...
Configuration settings for Statistics
In the dspace.cfg
...
file
...
review
...
the
...
following
...
fields
...
to
...
make
...
sure
...
they
...
are
...
uncommented:
...
Property: |
...
solr.log.server |
...
Example |
...
Value: |
...
solr.log.server |
...
= |
...
...
Informational Note: | Is used by the SolrLogger Client class to connect to the Solr server over http and perform updates and queries. In most cases, this can (and should) be set to localhost (or 127.0.0.1). |
...
|
...
Assuming |
...
you |
...
get |
...
an |
...
HTTP |
...
200 |
...
OK |
...
response, |
...
then |
...
you |
...
should |
...
set |
...
|
...
to |
...
the |
...
'/statistics' |
...
URL |
...
of |
...
'http://127.0.0.1/solr/statistics' |
...
(essentially |
...
removing |
...
the |
...
"/select?q= |
...
: |
...
" |
...
query |
...
off |
...
the |
...
end |
...
of |
...
the |
...
responding |
...
URL.) |
...
Property: |
...
solr.spiderips.urls |
...
Example Value: | solr.spiderips.urls |
...
= |
...
|
...
|
...
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="a84a2979-0f4b-4b59-a3bc-0bdcfdda21c8"><ac:plain-text-body><![CDATA[ | Informational Note: | List of URLs to download spiders files into [dspace]/config/spiders. |
...
These |
...
files |
...
contain |
...
lists |
...
of |
...
known |
...
spider |
...
IPs |
...
and |
...
are |
...
utilized |
...
by |
...
the |
...
SolrLogger |
...
to |
...
flag |
...
usage |
...
events |
...
with |
...
an |
...
"isBot" |
...
field, |
...
or |
...
ignore |
...
them |
...
entirely. |
...
command |
...
can |
...
be |
...
used |
...
to |
...
force |
...
an |
...
update |
...
of |
...
spider |
...
files, |
...
regenerate |
...
"isBot" |
...
fields |
...
on |
...
indexed |
...
events, |
...
and |
...
delete |
...
spiders |
...
from |
...
the |
...
index. |
...
For |
...
usage, |
...
run: |
...
|
...
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="9b8b0f7e-59fc-49a6-811e-32fda4b3ab9f"><ac:plain-text-body><![CDATA[from your [dspace]/bin directory | ]]></ac:plain-text-body></ac:structured-macro> | |
Property: | solr.dbfile | |
Example Value: | solr.dbfile = ${dspace.dir}/config/GeoLiteCity.dat | |
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="88061f14-ef2f-4173-921f-8e1a1ddedeb9"><ac:plain-text-body><![CDATA[ | Informational Note: | The following referes to the GeoLiteCity database file utilized by the LocationUtils to calculate the location of client requests based on IP address. During the Ant build process (both fresh_install and update) this file will be downloaded from [http://www.maxmind.com/app/geolitecity] |
...
if |
...
a |
...
new |
...
version |
...
has |
...
been |
...
published |
...
or |
...
it |
...
is |
...
absent |
...
from |
...
your |
...
[dspace |
...
]/config directory. | ]]></ac:plain-text-body></ac:structured-macro> |
Property: | solr.resolver.timeout |
...
Example Value: | solr.resolver.timeout |
...
= |
...
200 | |
Informational Note: | Timeout in milliseconds for DNS resolution of origin hosts/IPs. |
...
Setting |
...
this |
...
value |
...
too |
...
high |
...
may |
...
result |
...
in |
...
solr |
...
exhausting |
...
your |
...
connection |
...
pool. |
...
Property: | useProxies | ||
Example Value: | useProxies = true | ||
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="c8eb84ee-aaa5-4ff7-b6d6-a4b62c0e04fe"><ac:plain-text-body><![CDATA[ | Informational Note: | Will cause Statistics lohging to look for X-Forward URI to detect clients IP that have accessed it through a Proxy service. Allows detection of client IP when accessing DSpace. [Note: This setting is found in the DSpace Logging sesction of dspace.cfg] | ]]></ac:plain-text-body></ac:structured-macro> |
Property: | statistics.item.authorization.admin |
...
Example Value: | statistics.item.authorization.admin |
...
= |
...
true | |
Informational Note: | When set to true, only general administrators, collection and community administrators are able to access the statistics from the web user interface. As a result, the links to access statistics are hidden for non logged-in admin users. Setting this property to "false" will display the links to access statistics to anyone, making them publicly available. |
Property: | solr.statistics.logBots |
Example Value: | solr.statistics.logBots = true |
Informational Note: | When this property is set to false, and IP is detected as a spider, the event is not logged. |
...
* |
...
for |
...
query |
...
filter |
...
options) |
...
Property: |
...
solr.statistics.query.filter.spiderIp |
...
Example Value: | solr.statistics.query.filter.spiderIp |
...
= |
...
false | |
Informational Note: | If true, statistics queries will filter out spider IPs -- use with caution, as this often results in extremely long query strings. |
Property: | solr.statistics.query.filter.isBot |
...
Example Value: | solr.statistics.query.filter.isBot = true |
Informational Note: | If true, statistics queries will filter out events flagged with the "isBot" field. This is the recommended method of filtering spiders from statistics. |
Upgrade Process for Statistics.
Example of rebuild and redeploy DSpace (only if you have configured your distribution in this manner)
First approach the traditional DSpace build process for updating
Code Block |
---|
= true \\ | | Informational Note: | If true, statistics queries will filter out events flagged with the "isBot" field. This is the recommended method of filtering spiders from statistics. \\ | h3. Upgrade Process for Statistics. Example of rebuild and redeploy DSpace (only if you have configured your distribution in this manner) First approach the traditional DSpace build process for updating {code} cd [dspace-source]/dspace mvn package cd [dspace-source]/dspace/target/dspace-<version>-build.dir ant -Dconfig=[dspace]/config/dspace.cfg update cp -R [dspace]/webapps/* [TOMCAT]/webapps {code} |
Wiki Markup |
---|
The last step is only used if you are not mounting _\[dspace\]/webapps_ directly into your Tomcat, Resin or Jetty host (the recommended practice)If you only need to build the statistics, and don't make any changes to other web applications, you can replace the copy step above with: |
Code Block |
---|
} cp -R dspace/webapps/solr TOMCAT/webapps {code} |
Wiki Markup |
---|
_Again, only if you are not mounting \[dspace\]/webapps directly into your Tomcat, Resin or Jetty host (the recommended practice)_ |
...
Restart
...
your
...
webapps
...
(Tomcat/Jetty/Resin)
...
Older
...
setting
...
that
...
are
...
no
...
currently
...
utilized
...
in
...
the
...
reports
...
Are
...
the
...
following
...
Dspace.cfg
...
fields
...
still
...
used
...
by
...
the
...
new
...
1.6
...
Statistics?
...
If
...
not,
...
we
...
need
...
to
...
either
...
document
...
this
...
well
...
or
...
remove
...
them
...
altogether:
Code Block |
---|
} ###### Statistical Report Configuration Settings ###### # should the stats be publicly available? should be set to false if you only # want administrators to access the stats, or you do not intend to generate # any report.public = false # directory where live reports are stored report.dir = ${dspace.dir}/reports/ {code} |
These
...
fields
...
are
...
not
...
used
...
by
...
the
...
new
...
1.6
...
Statistics,
...
but
...
are
...
only
...
related
...
to
...
the
...
Statistics
...
from
...
previous
...
DSpace
...
releases
...
Statistics Administration
Converting older DSpace logs into SOLR usage data
If you have upgraded from a previous version of DSpace, converting older log files ensures that you carry over older usage stats from before the upgrade.
Statistics Client Utility
The command line interface (CLI) scripts can be used to clean the usage database from additional spider traffic and other maintenance tasks.
Statistics differences between DSpace 1.6.x and 1.7.0
SOLR optimization added
If required, the solr server can be optimized by running
Code Block |
---|
Administration h3. [Converting older DSpace logs into SOLR usage data|https://wiki.duraspace.org/display/DSDOC/System+Administration#SystemAdministration-DSpaceLogConverter] If you have upgraded from a previous version of DSpace, converting older log files ensures that you carry over older usage stats from before the upgrade. h3. [Statistics Client Utility|https://wiki.duraspace.org/display/DSDOC/System+Administration#SystemAdministration-ClientStatistics] The command line interface (CLI) scripts can be used to clean the usage database from additional spider traffic and other maintenance tasks. h2. Statistics differences between DSpace 1.6.x and 1.7.0 h3. SOLR optimization added If required, the solr server can be optimized by running {code} {dspace.dir}/bin/stats-util -o {code} |
.
...
More
...
information
...
on
...
how
...
these
...
solr
...
server
...
optimizations
...
work
...
can
...
be
...
found
...
here:
...
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations
...
.
...
SOLR
...
Autocommit
...
In
...
DSpace
...
1.6.x,
...
each
...
solr
...
event
...
was
...
committed
...
to
...
the
...
solr
...
server
...
individually.
...
For
...
high
...
load
...
DSpace
...
installations,
...
this
...
would
...
result
...
in
...
a
...
huge
...
load
...
of
...
small
...
solr
...
commits
...
resulting
...
in
...
a
...
very
...
high
...
load
...
on
...
the
...
solr
...
server.
...
This
...
has
...
been
...
resolved
...
in
...
dspace
...
1.7
...
by
...
only
...
committing
...
usage
...
events
...
to
...
the
...
solr
...
server
...
every
...
15
...
minutes.
...
This
...
will
...
result
...
in
...
a
...
delay
...
of
...
the
...
storage
...
of
...
a
...
usage
...
event
...
of
...
maximum
...
15
...
minutes.
...
If
...
required,
...
this
...
value
...
can
...
be
...
altered
...
by
...
changing
...
the
...
maxTime
...
property
...
in
...
the
Code Block |
---|
{dspace.dir}/solr/statistics/conf/solrconfig.xml. |