These RDF files were produced during the LD4L project. They are all available on the server for http://draft.ld4l.org/downloads/.
Panel | |
---|---|
|
RDF Files
available for download as described below.
Table of Contents
Converter output
The MARC records from each library were converted to BIBFRAME 1.0 RDF by the the Library of Congress mar2bibframe converter . LD4L's s bib2lod converter was was then used to produce RDF in the LD4L data model. The result is RDF in the the N-Triples format format. These dumps are available:
- cornell.ld4l.full-catalog.2016-03-17.tar.gz -- Converted triples for the Cornell catalog -- 13GB
- harvard.ld4l.full-catalog-1.2016-03-24.tar.gz -- Converted triples for the Harvard catalog (1 of 4) -- 5.4GB
- harvard.ld4l.full-catalog-2.2016-03-22.tar.gz -- Converted triples for the Harvard catalog (2 of 4) -- 5.2GB
- harvard.ld4l.full-catalog-3.2016-03-21.tar.gz -- Converted triples for the Harvard catalog (3 of 4) -- 6.3GB
- harvard.ld4l.full-catalog-4.2016-03-22.tar.gz -- Converted triples for the Harvard catalog (4 of 4) -- 5.4GB
- stanford.ld4l.full-catalog-1.2016-03-23.tar.gz -- Converted triples for the Stanford catalog (1 of 4) -- 3.4GB
- stanford.ld4l.full-catalog-2.2016-03-22.tar.gz -- Converted triples for the Stanford catalog (2 of 4) -- 3.4GB
- stanford.ld4l.full-catalog-3.2016-03-21.tar.gz -- Converted triples for the Stanford catalog (3 of 4) -- 4.0GB
- stanford.ld4l.full-catalog-4.2016-03-22.tar.gz -- Converted triples for the Stanford catalog (4 of 4) -- 4.8GB
...
These data files are available:
- 2016-03-17_cornell_ld4l_full_catalog_anno.tar -- Usage data for the Cornell catalog -- 478MB
- harvard.ld4l.full-catalog-1.2016-03-24_anno.tar -- Usage data for the Harvard catalog (1 of 4) -- 286MB
- harvard.ld4l.full-catalog-2.2016-03-22_anno.tar -- Usage data for the Harvard catalog (2 of 4) -- 272MB
- harvard.ld4l.full-catalog-3.2016-03-21_anno.tar -- Usage data for the Harvard catalog (3 of 4) -- 296MB
- harvard.ld4l.full-catalog-4.2016-03-22_anno.tar -- Usage data for the Harvard catalog (4 of 4) -- 239MB
...
These files are available:
- cornell_additional_triples.tar.gz -- Additional triples for the Cornell catalog -- 228MB
- harvard_additional_triples.tar.gz -- Additional triples for the Harvard catalog -- 217MB
- stanford_additional_triples.tar.gz -- Additional triples for the Stanford catalog -- 135MB
...
Search
The linked data at draft.ld4l.org is served by a Sinatra application, reading from a MySQL database. The database looks like this:
Code Block | ||
---|---|---|
| ||
mysql> use ld4l;
Database changed
mysql> show tables;
+----------------+
| Tables_in_ld4l |
+----------------+
| lod |
+----------------+
mysql> describe lod;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| uri | varchar(200) | NO | PRI | NULL | |
| rdf | mediumblob | NO | | NULL | |
+-------+--------------+------+-----+---------+-------+ |
'uri' corresponds to the uri of the requested linked data:
Code Block | ||
---|---|---|
| ||
mysql> select uri from lod where uri like 'http%' limit 5;
+---------------------------------------------------------------------+
| uri |
+---------------------------------------------------------------------+
| http://draft.ld4l.org/ |
| http://draft.ld4l.org/cornell |
| http://draft.ld4l.org/cornell/n000000d1-1ab5-4fc0-a33f-2fb4204eddb4 |
| http://draft.ld4l.org/cornell/n000000e4-e1cd-4fbb-872c-01456a8a8396 |
| http://draft.ld4l.org/cornell/n00000118-f2b5-4005-b20e-65e09af06a2a |
+---------------------------------------------------------------------+ |
'rdf' is the data that will be served, in Turtle format, zipped. As such, it is not readable until unzipped:
Code Block | ||
---|---|---|
| ||
mysql> select substring(rdf, 1, 70) from lod where uri = "http://draft.ld4l.org/";
+------------------------------------------------------------------------+
| substring(rdf, 1, 70) |
+------------------------------------------------------------------------+
F?&D}??(I?R<p?st?6D?(?P?Dj3t?.?V
?N$!ϥsM?G?? |
+------------------------------------------------------------------------+ |
These dumps are available:
- schema.sql -- Re-creates the LOD table -- 2KB
- combined-lod-complete_2016-04-06.sql -- The full MySQL database, containing linked data for all three institutions -- 220GB
- cornell-lod-complete_2016-04-01.sql -- Linked data for just the Cornell catalog -- 63GB
- harvard-lod-complete_2016-04-02.sql -- Linked data for just the Harvard catalog -- 92GB
- stanford-lod-complete_2016-03-31.sql -- Linked data for just the Stanford catalog -- 67GB
Solr index capture
The application at search.ld4l.org is built on project developed an experimental search service based on the converted data above. The application code is available from github (https://github.com/ld4l/ld4l_blacklight_search) and is built on Blacklight. Blacklight is a Rails app that includes a Solr search engine .
...
The and the structure of the search index is determined both by by the Solr schema and and the Blacklight catalog controller script
These dumps are available:
- solr-data-complete_2016-04-14.tar.gz -- The full search index, containing data for all three institutions -- 103GB
Triple-store captures
The triple-stores used were instances of Virtuoso OpenSource 7 (taken from the develop branch). More specifically, the Virtuoso instances were built from this source:
...
language | text |
---|
...
.
...
...
These dumps capture the data directories of the three triple-stores:
...