Date & Time

  • Main Call Tuesday, October 14 15:00 UTC/GMT - 11:00 ET
  • Satellite call Wednesday October 15, 07:00 UTC/GMT - 03:00 ET 

What is the difference between the "main call" and satellite call? 

Dial-in

We will use the international conference call dial-in. Please follow directions below.

  • U.S.A/Canada toll free: 866-740-1260, participant code: 2257295
  • International toll free: http://www.readytalk.com/intl
    • Use the above link and input 2257295 and the country you are calling from to get your country's toll-free dial in #
    • Once on the call, enter participant code 2257295

2014 Meeting objectives

From August until December 2014, the monthly DCAT meetings are centered around defining, refining and prioritizing DSpace use cases.

These use cases are expected to have an important impact on the medium and long term roadmap of DSpace, starting with DSpace 6 in 2015.

October Meeting Agenda: Statistics/Metrics use cases

During the October meeting we will be discussing statistics/metrics use cases.

 For each of the needs that emerge, we will try to qualify those needs as:

  • Supported: the use case is being addressed and that the bulk of configuration associated with it (if any) can happen through the UI.
  • Partially supported: there is room for improvement in the support for the use case. It also covers the cases where specific server configuration or small customizations to the code are required in order to properly support the use case.
  • Unsupported: if at all possible with DSpace, addressing the use case requires substantial modifications to the DSpace sourcecode.

We rather want to cover more use cases than to stick to a limited number, allowing to dig deeper in detail. This is why we will be asking the participants in the call for their institutions or personal priority after devoting ~5 minutes to a explanation and discussion about the actual use case. This means we hope to cover at least 10 use cases during the call. 

Read more about certain use cases that were already identified: Use Cases

The best way to participate and contribute

If you have some time to spare to prepare for this meeting, it would be great if you could briefly list the most important administrative use cases for you or your institution, especially if they fall in the category unsupported.

  1. Sign up for an account on this wiki and log in.
  2. Put your use cases in the comment section of this page. 
  3. Join either the main call or satellite call and tell us about your use cases

 

Discussed use cases

Call Attendees (main+satellite)

  • Bram Luyten (@mire) - @mire
  • Maureen Walsh - Ohio State University
  • Mark Woods - Indiana University 

  • Jeanette Hatherill - University of Ottawa

  • Felicity Dykas - University of Missouri

  • Terry Brady - Georgetown University 

  • Kathleen Schweitzberger - University of Missouri

  • Pauline Ward, Univ of Edinburgh

  • Valorie Hollister - DuraSpace

  • Elin Stangeland - University of Oslo
  • Emilio Lorenzo - Arvo Consultores
  • No labels

9 Comments

  1. A few usecases here:

    Split out internal vs external traffic (currently unsupported) 

    As a repository administrator, I want to be able to differentiate between institutional traffic and "outside" traffic.

    Split out national vs international traffic (currently unsupported) 

    As a repository administrator, I want to be able to differentiate between national traffic and the sum of all international traffic to the repository.

    Better ways to deal with robots and abusive traffic

    This includes prevention (avoid robot traffic to get logged as real hits, or blocking it all together), fast remediation (easily remove certain traffic in bulk) and analysis of the traffic sources/agents. An interesting development here is that the COUNTER Robots working group is currently thinking about flagging a limited top percentage of IPs/hosts that generate the most traffic to a repository as "suspicious". 

  2. Use cases developed by Terrence W Brady and Kate Dohe at Georgetown University.

    Item Statistics

    • As a collection owner, I would like to view how often an item has been included in search results (currently unsupported).
    • As a collection owner, I would like to see item view statistics and bitstream view statistics.  Currently, the "file visit" counts appear to be radically higher than the item view counts.  (Are thumbnail views triggering these counts)? (current bug)
    • As a collection owner, I would like to see the distribution of how a user arrived at an item in addition to the item counts: Google search, Google Scholar, repository full text search, repository facet search, repository browse, repository related item, external link (currently unsupported)
    • As a collection owner, I would like to have the option to display item usage statistics on public-facing item pages by modifying theme properties. (partially supported)

    Collection and Community Statistics

    • As a collection owner viewing usage statistics, I would like to clear understand whether a statistic applies to a specific object or if it applies to an object and all of its descendants.  (currently unsupported)
    • As a collection owner viewing community statistics, I would like to understand the cumulative usage across the community (sub-community usage, collection usage, item usage, bitstream usage).  I would like to see a list of the top collections and items contributing to the cumulative usage totals (currently unsupported)
    • As a collection owner viewing collection statistics, I would like to understand the cumulative usage across the collection (item usage, bitstream usage).  I would like to see a list of the top items contributing to the cumulative usage totals (currently unsupported)
    • As a collection owner viewing usage statistics in Google Analytics, I would like to be able to access cumulative statistics across various levels of repository hierarchy (which cannot be inferred by existing URL's).  (currently unsupported)

    Search Statistics

    • As a collection administrator viewing search statistics, I would like to distinguish facet searches from user-initiated full text search (currently unsupported)

    • As a collection administrator viewing search statistics, I would like to configure the number of searches to display (? supported)

    • As a collection administrator viewing search statistics, I would like to view "page views per search" but they are currently reporting very low numbers (due to facet searches?) (partially supported)

    General Needs for Usage Statistics

    • As a collection owner viewing usage statistics, I would like to be export statistics for use in other analytic and visualization software.(currently unsupported)
    • As a repository administrator at an institution that does not track usage by specific users, I would like to have the option to suppress user login information from statistics system although it could be useful to track authenticated vs unauthenticated or faculty vs student access (currently unsupported) 

    • As a collection owner viewing usage statistics, I would like to holistically view statistics collection by collection or community by community in order to identify spikes and trends in access (currently unsupported)

    • As a collection owner viewing usage statistics, I would like to configure date ranges for a report  (by day, week, month, quarter) (currently unsupported)

    • As a collection owner viewing usage statistics, I would like to filter usage statistics by referrer domain (currently unsupported)

       

     

     

  3. If there is time I would also love to discuss the expected behaviour of statistics in following cases. Each of the cases is followed by a proposal:

    Aggregated views on statistics

    • Item pageviews and related bitstream downloads are displayed in both owning and mapping collections
    • On any aggregated view, it's ensured that the same views or downloads are not counted twice
    • side effect: if you look at community level stats, and you have items that appear in multiple collections in that community, the numbers will not add up. To be more precise: the sum of the aggregated collection view/downloads for the items will be HIGHER than the view on the community level

    Moving an item

    • If an item is moved, all of its download and pageview stats are moved as well into the new collection//community. Because stats are linked to items, they "disappear" from the original community/collection stats due to the move.

    Versioning an item

    • The default view on stats for a versioned item should be the aggregated view on stats of all versions of that item.
    • (Moving to a different community/collection is supported as part of creation of a new version) When a new version of an item is moved to a different collection, only the new hits/pageviews on the new version will be aggregated in the new collection's/community's statistics. 

    Deleting an item (expunge)

    • Pageview and download counts are deleted for this item as well
    • Strong objection against deleting any usage that has happened. Even though the item itself gets deleted, the fact that the usage happened doesn't go away by deleting an item. Labelling/flagging usage as related to deleted items would be much more preferable than deleting it as a whole.

    Withdrawing an item

    • Pageview and download counts are being kept on the aggregated levels (collection/community/entire repository)

     

     

  4. Is it useful to keep thumbnail hits in the bitstream download counts? Or is this pollution of the stats?

    1. This must be related with the problem of identifying the "main" bitstream

      a) excluding license, cc-license, xml, thumbnails and text bundles. OK

      b) what about other "customer" bundles, apart from Original ?.... Should be considered?

      c) how to identify/split/count   traffic to bitstreams in the Original bundle? primary bitstreams?  (also related with bitstreams versions as you pointed)

    2. Reading  Unable to locate Jira server for this macro. It may be due to Application Link configuration.  and  Unable to locate Jira server for this macro. It may be due to Application Link configuration.  again in detail it became clear that even though thumbnail downloads are being logged, they are currently NOT included in the overview of bitstream downloads.

      The overview of bitstream downloads only takes into account ORIGINAL bundle files. So disabling/disactivating the logging of thumbnail downloads should not affect the view on your download counts.

  5. Oops, just to clarify: we'd find it helpful to have a handy monthly figure for the number of new registered users (if that doesn't already exist) as well as the number of new depositors. Thanks very much.

  6. All of the above focus specifically on usage statistics for content (download, page views, etc.).  In addition to these, I need to be able to produce system/productivity related metrics:

    1. Number of items deposited (this month, or over a specific date range)
    2. Number of items deposited compared with previous months (broken down into month-by-month data)
    3. Number of approvals for the open archive
    4. Number of approvals compared with previous months (or over a specific date range)
    5. Number of approved items under embargo (temporary and permanent)

    Archival counts per month and total number of archived items are already supported, though this does not allow for flexible date ranges (the need for which is noted above), but I don't know about the rest.