Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

I've begun the process of putting in place a series of modules and core changes to DSpace to support the inclusion of external statistics and reporting systems. Much of this work is coming from code donations from @MIRE and represents components of @MIRE products with we feel will improve the health of the DSpace ecosystem by exposing. Most specifically, by exposing and donating these components, we seek to show "how" modularity needs to be modelled and apporached not only in the future DSpace+2.0, but more immediately now in 1.6. --Mark Diggory 04:25, 3 July 2009 (EDT)

The breakdown of the projects is as follows:

...

  • In my opinion (Tim Donohue) , there's really two main types of practical statistics that may be of interest to different individuals using a repository
    • Page Views (i.e. individual visits to individual splash pages within a repository) - From experiences at U of Illinois, most of our individual faculty members or departments don't really care about how many people out on the web visit their item "splash page". These "Page Views" however are still usually of interest to the Repository Administrators, as it can sometimes help them determine how users are using the site and what they are visiting, etc. Reports about Page Views could be generated by something like Google Analytics (or AWStats), as that is what both do quite well – Tim Donohue 10:00, 29 May 2009 (CDT)
    • Downloads (i.e. download counts of individual files in the repository) - From experiences at U of Illinois, this is what our faculty/staff/departments really want to see from a repository. They are very interested in reports of actual downloads over time, including "Top 10" lists of downloads at Community/Collection levels over different time frames (e.g. "Top 10 Downloads in ___ Community in last month/year/overall"). These download reports would be much more difficult to generate using something like Google Analytics, as Google Analytics has no concept of the hierarchical structure of a given repository (i.e. Community / Collection / Item / Bitstream) – Tim Donohue 10:00, 29 May 2009 (CDT)
  • I agree with Tim, and would only add this: We have had numerous requests to collect and present information about referring site – in other words, whether the link to the DSpace item or bitstream is coming from Google, Google Scholar, Yahoo, or wherever, so we've begun to capture that information. Jim Ottaviani 13:00, 29 May 2009 (EDT)
    • Just to back up what Jim has said, U of Illinois has also had numerous requests for information about where downloads are coming from. This includes wanting to know if they are coming from Google, a direct blog link, etc (like Jim mentioned), but also where they may be coming from in the world (on-campus, from the USA, from elsewhere in the world, etc.). Again, the requests we've heard all have to do with generating these reports based on downloads and not based on "page views" --Tim Donohue 12:30, 29 May 2009 (CDT)
  • Yes, agreed. Its always the downloads that people after, but once you have an implementation to support downloads, adding hooks for page views is trivial and the door is open to many other reports as well. --Mark Diggory 14:12, 9 June 2009 (EDT)

COUNTER compliant usage statistics

...

  • I believe Google Analytics can't track direct bitstream downloads very accurately.. can anyone confirm/discuss further? --Kshepherd 17:21, 28 May 2009 (EDT)
  • I am not a fan of using Google Analytics as it places us dependent on an external service for statistics collection. --Mark Diggory 17:25, 28 May 2009 (EDT)
  • The great thing about GA Stats is that many institutions already have a backlog of GA Stats history, and that they generally implement the "counting" in the same way (script in the footer of the page). Important drawbacks are that it doesn't take into account bitstreams that are directly downloaded without accessing the web pages (for example, direct download through google). Also, it's impossible to aggregate data directly from Google analytics: most popular items per community, collection, ... It would be great if the ultimate solution will allow to "draw in" google analytics data, and let the repository administrator decide whether he or she wants to visualize GA stats data, or internally measured data. GA stats data might always come in useful, if people want to compare their data to institutions running another platform (eprints or fedora), if those institutions are also running GA stats. --Bluyten 06:14, 29 May 2009 (EDT)

...

  • when a dataservice is available to query against for statistics, providing reports will become independent of being "offline" or "cron" jobs, we should be striving for a new approach using existing reporting tools and datastores for querying against. Certainly NOT perl scripting. --Mark Diggory 17:41, 28 May 2009 (EDT)
  • Fair comment, but how scalable are these new approaches? Can we feel safe running adhoc live queries when we have >1000000 usage events stored? The length of time taken to generate a single report is the only thing stopping my own stats systems from being real-time. --Kshepherd 17:51, 28 May 2009 (EDT)
    • I feel that our stats engine is up to this challenge. --Mark Diggory 14:13, 9 June 2009 (EDT)

A Case for Google Analytics

...

The importance of "Hooks" for specific data collection points and use of a statistics logging backend should not be undervalued. We have an opportunity here to gather critical data that can be used to facilitate a richer browse, search and viewing expereince. Throwing it off to GA and suggesting its out of scope of "Repository Concerns" is a bit heavy handed IMO. --Mark Diggory 14:06, 9 June 2009 (EDT)

Also consider that many IR administrators are probably expecting this addon to work with the local data they've already collected – people who have been requesting statistics have been sitting on years worth of logs, but do not necessarily have any accurate historical data with GA. (I realise that the logs aren't in the UsageEvent format we want to use now, anyway, but converting them is not a big deal). I think there would have to be more of these cases than IRs having years of GA data, but no logs saved. --Kshepherd 22:02, 9 June 2009 (EDT)

...