We used various scripts to analyze different data sources.
- We used client-side AJAX queries to retrieve the first 10,000 hubs from LOC and then navigate to related works and instances to analyze how many LOC Hubs provide two or more instances with ISBNs or LCCns.
- https://github.com/LD4P/blacklight-cornell/blob/bang/app/assets/javascripts/bang/evalHub.js
- This code looks at how many hub to hub relationships provide LCCNs or ISBNs. To avoid throttling issues, the code queried 500 hubs at a time. When bringing up the page that ran the code, we would set the starting hub number, effectively paging through the first 10,000 hubs returned from LOC.
- Hubs were retrieved using this call "https://id.loc.gov/search/?q=cs:http://id.loc.gov/resources/hubs&count=" + this.sampleSize + start + "&format=json" where the sample size and starting hub number could be specified.
- --parse button--
- Related view: https://github.com/LD4P/blacklight-cornell/blob/bang/app/views/bang/eval_hubs/index.erb
- https://github.com/LD4P/blacklight-cornell/blob/bang/app/assets/javascripts/bang/evalHubAggregation.js
- This code retrieves unique ISBNs for every hub that has more than 1 work. This code also uses the same sample size and paging approach as the code above.
- Related view: https://github.com/LD4P/blacklight-cornell/blob/bang/app/views/bang/eval_hubs/same_hub.erb
- https://github.com/LD4P/blacklight-cornell/blob/bang/app/controllers/bang/eval_hubs_controller.rb
- https://github.com/LD4P/blacklight-cornell/blob/bang/app/assets/javascripts/bang/evalHub.js
- PCC data analysis