Contents |
DSpace contains functionality for statistics for which updated documentation can be retrieved in the official DSpace documentation on DSpace Statistics |
At the DSpace User Group Meeting 2005 in Cambridge there was some interest in developing statistics reporting for DSpace to a greater degree.
Initially DSpace came with a basic log file analyser which performed aggregation on the logged actions and produced a basic text report of system activity. Since then there have been a number of developments in statistics for DSpace, each with a different focus or methodology. Following discussions in Cambridge there seemed to be sufficient interest in stats development that we are proposing to work together to provide some sort of more advanced package or module to handle stats and possibly logging in DSpace. We would like to have the design process happen in public, and use this wiki page as the main point of contact for development.
It would seem sensible to take ideas and code from these and any other systems that people have developed to come up with something to meet needs.
During the statistics BOF session in Cambridge, a number of issues and thoughts came up regarding the challenges and requirements of DSpace statistics
We noted that there were at least 3 different types of statistics that may be interesting to people:
The discussion noted the number of different approaches already in use (as given above). The main common need raised was for statistics for file downloads and viewing of items and other usage (so category 1 stats). The possibility of having a table or tables containing the raw activity data was discussed, as well as how this would be populated, what the performance implications would be and how to eliminate erroneous robot accesses. Also raised was the possibility of opening up such data for OAI harvesting, thus allowing the possibility of cross instance analysis of usage.
Also discussed was the possibility of a more structured approach to logging events (into the db) that would allow simpler querying of logged data (the current methods, as listed above, requiring regexp-based logic either in the aggregation or at the querying stages).
There are a number of points to consider before getting too deep into development. These are listed below, please add any comments/ideas/additional issues.
Should we be thinking about replacing log4j, overloading/extending it, leaving it as it is?
In cases where log4j is writing to the same file from different VMs it would be more advised we have a socket based logging service. – MarkDiggory 08:31, 3 November 2006 (EST)
Is there a way we can address all 3 of the above statistics types in one package?
We should make this package totally modular, in part as an experiment in modularisation of DSpace.
Can we use the new Plug-in Manager for configuring the Statistics module
If we replace log4j, and opt for a database and filesystem solution, what are the issues we may encounter?
An alternate solution would be to keep log4j and use http://www.dankomannhaupt.de/projects/index.html solution – MarkDiggory 08:43, 3 November 2006 (EST)
StatisticsProposalOne - Some design thoughts by RichardJones
StatisticsAndLog4jIdeas - some ideas about leveraging log4j functionality by LiamLynch
StatisticsFurtherSpeculations - some further ideas by RichardJones about implementation of logging, based on LiamLynch's above ideas.
AdministrativeStatistics - some thoughts about how certain administrative statistics might be handled by RichardJones
ReportGeneration - yet more semi-coherent thoughts about statistical analysis by RichardJones
Please add yourself here if you are interested in this work, indicate what level of involvement you would like, and perhaps describe a little about your interests/requirements
Please add your thoughts, links to any stats work you may have done, design questions/solutions, requirements and so forth to this page.