Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Analysis of ISBNs aggregated under the same Opus
    • Our first pass at queries had taken too long, so we broke the process up into separate portions. 
      • First, we queried the PCC data using Dave's Fuseki server (or a copy of the data on our own Fuseki server) to retrieve a list of all Opera that had at least two works with instances with ISBNs.  The query we used is captured here. Executing this query resulted in the following list of Opera URIs.  
      • This script takes the list of Opera URIs and executes SPARQL queries to retrieve the ISBNs of any instances that correspond to different work URIs aggregated by that Opus.  Running the script results in a file where each line has sets of ISBNs corresponding to an opus (Note that the script has the Fuseki SPARQL URL not included so running the script would require replace that part of the code with the Fuseki SPARQL URL you wish to query.)
      • Another script then takes the file with the ISBN groups to check which of these groups results in at least two catalog matches.  The script outputs the ISBN groups that result in matches long with a summary (i.e. total number of rows, total number of matches, etc.) and a list of , the ISBN groups that listed in only one match, and those that didn't result in a match .  We captured the part of the output that lists the matching ISBN groups hereat all.  The output is captured here.
  • Analysis of LCCNs aggregated under the same Opus
    • Similar to our ISBN analysis, we first queried the PCC data to generate a list of all Opera that have at least two works with instances with LCCNs.  The query is captured here and the results here.
    • The same script used above to execute SPARQL queries is also used for querying this list of Opera to get the LCCNs grouped under each opus.  The line used for LCCNs is commented out at the bottom of the code.  For LCCNs, this script output the following file where each line has a set of LCCNs grouped under the same Opus.
    • This script analyzes these LCCN groups to see which have more than one catalog match and lists the groups that resulted in a match along with a summary of total rows processed and the number of matches.  The output file is here and also contains the rows that resulted in only one catalog match and those that didn't result in any matches.

POD data analysis

Fuseki UI