LD4L Data Archive

These RDF files were produced during the LD4L project. They are available for download as described below.

Converter output

The MARC records from each library were converted to BIBFRAME 1.0 RDF by the Library of Congress mar2bibframe converter . LD4L's bib2lod converter was then used to produce RDF in the LD4L data model. The result is RDF in the N-Triples format. These dumps are available:

cornell.ld4l.full-catalog.2016-03-17.tar.gz -- Converted triples for the Cornell catalog -- 13GB
harvard.ld4l.full-catalog-1.2016-03-24.tar.gz -- Converted triples for the Harvard catalog (1 of 4) -- 5.4GB
harvard.ld4l.full-catalog-2.2016-03-22.tar.gz -- Converted triples for the Harvard catalog (2 of 4) -- 5.2GB
harvard.ld4l.full-catalog-3.2016-03-21.tar.gz -- Converted triples for the Harvard catalog (3 of 4) -- 6.3GB
harvard.ld4l.full-catalog-4.2016-03-22.tar.gz -- Converted triples for the Harvard catalog (4 of 4) -- 5.4GB
stanford.ld4l.full-catalog-1.2016-03-23.tar.gz -- Converted triples for the Stanford catalog (1 of 4) -- 3.4GB
stanford.ld4l.full-catalog-2.2016-03-22.tar.gz -- Converted triples for the Stanford catalog (2 of 4) -- 3.4GB
stanford.ld4l.full-catalog-3.2016-03-21.tar.gz -- Converted triples for the Stanford catalog (3 of 4) -- 4.0GB
stanford.ld4l.full-catalog-4.2016-03-22.tar.gz -- Converted triples for the Stanford catalog (4 of 4) -- 4.8GB

Usage data

StackScore usage data is available for the Cornell and Harvard holdings. The scores appear as annotations on the individual bib_ids. Each file contains the usage data for the corresponding, similarly named file of converter output. Data is in N-Triples format.

These data files are available:

2016-03-17_cornell_ld4l_full_catalog_anno.tar -- Usage data for the Cornell catalog -- 478MB
harvard.ld4l.full-catalog-1.2016-03-24_anno.tar -- Usage data for the Harvard catalog (1 of 4) -- 286MB
harvard.ld4l.full-catalog-2.2016-03-22_anno.tar -- Usage data for the Harvard catalog (2 of 4) -- 272MB
harvard.ld4l.full-catalog-3.2016-03-21_anno.tar -- Usage data for the Harvard catalog (3 of 4) -- 296MB
harvard.ld4l.full-catalog-4.2016-03-22_anno.tar -- Usage data for the Harvard catalog (4 of 4) -- 239MB

Additional triples

Additional triples were created to supplement the converter output, adding Work IDs to the Works, and creating links across institutions, between corresponding Works and Instances.

A concordance file was created, associating all known OCLC numbers with their corresponding Work IDs. This file was made with data extracted from a recent Research snapshot of WorldCat, and is structured as follows:

Column 1: every OCLC number found in a record from both 001 and 019
Column 2: the current OCLC number for the record, from 001
Column 3: the current Work ID associated with the record

Fields are tab-delimited. For example:

100000569	100000569	49300684
100000668	100000668	83546218
100000767	100000767	83546282

Using this concordance file, each work was assigned a Work ID, based on the OCLC number of its instances. For example:

<http://draft.ld4l.org/cornell/n556b336629626fa2> 
    <http://www.w3.org/2000/01/rdf-schema#seeAlso> 
        <http://worldcat.org/entity/work/id/57063107> .

Although the data from the three institutions were stored in three separate triple-stores, owl:sameAs statements were created where possible to link matching works or matching instances in the separate collections.

Instances with matching OCLC identifiers were linked with owl:sameAs, as were Works with matching Work IDs.

These files are available:

cornell_additional_triples.tar.gz -- Additional triples for the Cornell catalog -- 228MB
harvard_additional_triples.tar.gz -- Additional triples for the Harvard catalog -- 217MB
stanford_additional_triples.tar.gz -- Additional triples for the Stanford catalog -- 135MB

Search

The project developed an experimental search service based on the converted data above. The application code is available from github (https://github.com/ld4l/ld4l_blacklight_search) and is built on Blacklight. Blacklight is a Rails app that includes a Solr search engine and the structure of the search index is determined both by the Solr schema and the Blacklight catalog controller script.

Page tree

LD4L Data Archive

Converter output

Usage data

Additional triples

Search