You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

We used various scripts to analyze different data sources. 

LOC Hub Analysis

  • We used client-side AJAX queries to retrieve the first 10,000 hubs from LOC and then navigate to related works and instances to analyze how many LOC Hubs provide two or more instances with ISBNs or LCCns.
  • We wrote scripts to further analyze these groupings of hubs to see how many catalog matches we could get.
    • Finding catalog matches for LCCN sets grouped under LOC Hubs
      • This file (HubSetsLccn.csv) lists an LOC Hub on each line followed by a list of LCCNs from instances that fall under that hub.
      • A script (processlccn.rb) reads in this file and then generates the file (lccnhubonlyfirst) which lists the LCCN rows that matched at least two catalog items, and then ends with a summary.  (The output says "ISBN" but is in fact "LCCN" because the same code was copied/used the ISBN analysis).
    • Finding catalog matches for LCCN sets grouped under LOC Hub to Hub relationships
      • Each line in the file (prophublccnsets.csv) lists the name of the relationship (e.g. "hasTranslation") that links two different hubs, followed by the LCCNs that fall under those hubs. 
      • A script (processrellccn.rb) reads in this file and then generates the file (lcchubrels) which starts with a list of the property and LCCN groups that resulted in at least two catalog matches (e.g. "hasTranslation : 2017328875,92911176,93910013") followed by a summary of the total number of rows and LCCNs in the original file and the number of matching rows/LCCNs.  In addition, the file also lists those hub relationship and LCCN groupings from the original CSV file that resulted in exactly one match in the catalog.  This latter piece of information was used for our POD analysis.

PCC data analysis

  • No labels