Contents

DSpace contains functionality for statistics for which updated documentation can be retrieved in the official DSpace documentation on DSpace Statistics

Introduction

At the DSpace User Group Meeting 2005 in Cambridge there was some interest in developing statistics reporting for DSpace to a greater degree.

Initially DSpace came with a basic log file analyser which performed aggregation on the logged actions and produced a basic text report of system activity. Since then there have been a number of developments in statistics for DSpace, each with a different focus or methodology. Following discussions in Cambridge there seemed to be sufficient interest in stats development that we are proposing to work together to provide some sort of more advanced package or module to handle stats and possibly logging in DSpace. We would like to have the design process happen in public, and use this wiki page as the main point of contact for development.

It would seem sensible to take ideas and code from these and any other systems that people have developed to come up with something to meet needs.

Notes from Cambridge

During the statistics BOF session in Cambridge, a number of issues and thoughts came up regarding the challenges and requirements of DSpace statistics

We noted that there were at least 3 different types of statistics that may be interesting to people:

  1. Activity statistics - file downloads, user logins, search requests, etc.
  2. Archive statistics - number of items, number of types of items, etc.
  3. Administrative statistics - how long items spend in submission/workflow, etc.

The discussion noted the number of different approaches already in use (as given above). The main common need raised was for statistics for file downloads and viewing of items and other usage (so category 1 stats). The possibility of having a table or tables containing the raw activity data was discussed, as well as how this would be populated, what the performance implications would be and how to eliminate erroneous robot accesses. Also raised was the possibility of opening up such data for OAI harvesting, thus allowing the possibility of cross instance analysis of usage.

Also discussed was the possibility of a more structured approach to logging events (into the db) that would allow simpler querying of logged data (the current methods, as listed above, requiring regexp-based logic either in the aggregation or at the querying stages).

Development Discussion

There are a number of points to consider before getting too deep into development. These are listed below, please add any comments/ideas/additional issues.

Log4j

Should we be thinking about replacing log4j, overloading/extending it, leaving it as it is?

In cases where log4j is writing to the same file from different VMs it would be more advised we have a socket based logging service. – MarkDiggory 08:31, 3 November 2006 (EST)

Combining and Improving Existing Statistics Packages

Is there a way we can address all 3 of the above statistics types in one package?

Achieving Modularity

We should make this package totally modular, in part as an experiment in modularisation of DSpace.

Can we use the new Plug-in Manager for configuring the Statistics module

Database Solution

If we replace log4j, and opt for a database and filesystem solution, what are the issues we may encounter?

An alternate solution would be to keep log4j and use http://www.dankomannhaupt.de/projects/index.html  solution – MarkDiggory 08:43, 3 November 2006 (EST)

Related Thoughts and Design Issues

StatisticsProposalOne - Some design thoughts by RichardJones

StatisticsAndLog4jIdeas - some ideas about leveraging log4j functionality by LiamLynch

StatisticsFurtherSpeculations - some further ideas by RichardJones about implementation of logging, based on LiamLynch's above ideas.

AdministrativeStatistics - some thoughts about how certain administrative statistics might be handled by RichardJones

ReportGeneration - yet more semi-coherent thoughts about statistical analysis by RichardJones

Interested Parties

Please add yourself here if you are interested in this work, indicate what level of involvement you would like, and perhaps describe a little about your interests/requirements

Please add your thoughts, links to any stats work you may have done, design questions/solutions, requirements and so forth to this page.