Date & Time
- August 9th 15:00 UTC/GMT - 11:00 ET
This call is a Community Forum call: Sharing best practices and challenges in the use of existing DSpace features
We will use the international conference call dial-in. Please follow directions below.
- U.S.A/Canada toll free: 866-740-1260, participant code: 2257295
- International toll free: http://www.readytalk.com/intl
- Use the above link and input 2257295 and the country you are calling from to get your country's toll-free dial in #
- Once on the call, enter participant code 2257295
Community Forum Call: DSpace Statistics
Sharing best practices, challenges, and questions
- DSpace statistics
- interpreting statistics
- improving robot filtering & assessing robot traffic
- exchanging which types of reports are being used for which purposes
Preparing for the call
Bring your questions/comments you would like to discuss to the call, or add them to the comments of this meeting page.
If you can join the call, or are willing to comment on the topics submitted via the meeting page, please add your name, institution, and repository URL to the Call Attendees section below.
History of DSpace statistics
The first DSpace statistics, currently often referred to as the DSpace legacy stats, were based on DSpace logs. As this system does not take into account any traffic originating from bots, let alone they would filter out such traffic, it is highly discouraged to use these statistics. The lack of robot filtering would bias the results and make them uninterpretable.
The current DSpace usage statistics, introduced in DSpace version 1.6, is based on SOLR.
After the release further improvements and alternatives to the standard DSpace statistics have been developed on the initiative of several universities, institutions, and third party service providers.
An alternative to the DSpace statistics is google Analytics. Although this is an interesting tool to use in some use-cases, it does have some limitations. First of all analytics is a black box. You have to assume its robot filtering is working properly as it is unknown what filtering is used. Secondly, google analytics doesn't know DSpace's internal structure. It isn't familiar with the hierarchy of repository, communities, collections and items. This causes Analytics to be unable to create statistics on an aggregated level (e.g. the total item page views of all items in a collection).
Future of DSpace statistics
In the new User Interface it would be beneficial to enable SOLR to be queried directly through the centralized API instead of SOLR's REST API. This would allow to replace SOLR with another system, should a better data source arise. In the meantime, people developing to the DSpace statistics layer could more easily contribute their work to the community, as this would also be built upon this central DSpace API.
Some institutions noticed performance issues caused by the overhead created by SOLR. Harvard university has solved this issue by relying on web server logs. These logs are already made and therefor do not add additional load on DSpace. An other solution by a third party service provider was to use elasticsearch instead of SOLR, which appeared performant.
There are some opportunities to reduce the overhead load created by SOLR. It is for example not required to run SOLR on the same server as DSpace. It is possible to create a separate SOLR server. Another way of reducing the load is by creating a sharded SOLR core (for example per year). One third party service uses a SOLR caching mechanism to balance the load SOLR puts on DSpace's performance, this way there should not be a noticeable difference.
Up to now the name of DCAT itself, the 'DSpace Community Advisory Team', sounds rather formal. This may scare people off to join the conversations. For that reason there will be meetings called 'Community Forum calls'. We hope this name indicates the call is open to the entire community.
Discussion topics for the next DCAT calls are already listed on the DCAT meeting notes page. Next month's topic of interest will be the DSpace standard Data model and DSpace-CRIS.
- Maureen Walsh - The Ohio State University
- Bram Luyten (Atmire)
- Ignace Deroost - Atmire
- David Corbly - University of Oklahoma
- Mariya Maistrovskaya - University of Toronto
- Terrence W Brady - Georgetown University
- Valerie Collins - University of Minnesota
- Marianne Reed - University of Kansas
- Monica Rivero - Rice University
- Iryna Kuchma - EIFL
- Peter Dietz - Longsight
- Elias Tzoc - Miami University
- Daniel Draper -Colorado State University
- Pauline Ward - University of Edinburgh
- Filipe Furtado - University of Minho
- Susan Borda - Montana State University
- Felicity Dykas - University of Missouri–Columbia
- Joseph Greene - University College Dublin
- Irene Berry - Naval Postgraduate School