Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

February 24 LD4L Workshop breakout session: Usage Data

facilitator: Paul Deschner

Usage data sources
  • OCR-ed bibliographies and page rank
  • ILL usage
  • Yahoo circ logs
  • Web analytics (e.g., DPLA UI analytics, esp. contextual granularity)
  • Search terms as form of usage; also as compared to other usage data
  • Entities extracted from queries, not simply literal queries themselves
  • How often a link is traversed; how many times your link has been reconciled in triple store
  • Browsed materials
  • Citations; also citation networks as compared to other usage data
  • Course-book lists across institutions
StackScore
  • Makes data muddy
  • Too many metrics mixed together; need to separate out the metrics
  • Common metrics needed across institutions
  • Computational transparency important: metrics and algorithms

...

Negative usage data at local institution

  1. Important to see what users are looking for but local institution doesn’t have

  2. What doesn’t circulate in-house but is available via ILL

  3. What isn’t read at Columbia but at Yale

...

Usage data runs risk of becoming prescriptive

  1. Blandness of collections when everyone acquires most popular items

Use cases
  • Keeping tabs on popularity of colleagues’ publications
  • Usage data as diagnostic tool for targeted collections: highly invested-in parts of collection not being used could drive arranging an exhibition to increase awareness
  • Scholars doing research on other scholars' research and publications
  • Look at when items were used: what was checked out in last week, month, year, etc.
  • Link traversals and other link metrics could be sent to link’s

...

Long tail issue generally and at own institution

...

Options: random selection out of tail for exposure, subject-filtered selection

...

Important that UI expose long-tail possibilities prominently, above the page-fold

...

  • publisher
Privacy
  • Opt-in option for users willing to share their usage data
  • Huddersfield University (England): more liberal approach to data exposure, including access to clustering (users who borrowed this also borrowed that) and usage by academic course and school
  • IP-based web stats inherently less risky than personal ID-based circulation data
  • Anonymization tools important
  • Clustering dangerous
Long tail issue generally and at own institution
  • Options: random selection out of tail for exposure, subject-filtered selection
  • Important that UI expose long-tail possibilities prominently, above the page-fold
  • Usage data from other institutions and ILL balances out local institution’s biases
Negative usage data at local institution
  • Important to see what users are looking for but local institution doesn’t have
  • What doesn’t circulate in-house but is available via ILL
  • What isn’t read at Columbia but at Yale
Usage data runs risk of becoming prescriptive
  • Blandness of collections when everyone acquires most popular items