These RDF files were produced during the LD4L project. They are all available on the server for http://draft.ld4l.org/downloads/.
Panel | |
---|---|
|
RDF Files
available for download as described below.
Table of Contents
Converter output
The MARC records from each library were converted to BIBFRAME 1.0 RDF by the the Library of Congress mar2bibframe converter . LD4L's s bib2lod converter was was then used to produce RDF in the LD4L data model. The result is RDF in the the N-Triples format format. These dumps are available:
- cornell.ld4l.full-catalog.2016-03-17.tar.gz -- Converted triples for the Cornell catalog -- 13GB
- harvard.ld4l.full-catalog-1.2016-03-24.tar.gz -- Converted triples for the Harvard catalog (1 of 4) -- 5.4GB
- harvard.ld4l.full-catalog-2.2016-03-22.tar.gz -- Converted triples for the Harvard catalog (2 of 4) -- 5.2GB
- harvard.ld4l.full-catalog-3.2016-03-21.tar.gz -- Converted triples for the Harvard catalog (3 of 4) -- 6.3GB
- harvard.ld4l.full-catalog-4.2016-03-22.tar.gz -- Converted triples for the Harvard catalog (4 of 4) -- 5.4GB
- stanford.ld4l.full-catalog-1.2016-03-23.tar.gz -- Converted triples for the Stanford catalog (1 of 4) -- 3.4GB
- stanford.ld4l.full-catalog-2.2016-03-22.tar.gz -- Converted triples for the Stanford catalog (2 of 4) -- 3.4GB
- stanford.ld4l.full-catalog-3.2016-03-21.tar.gz -- Converted triples for the Stanford catalog (3 of 4) -- 4.0GB
- stanford.ld4l.full-catalog-4.2016-03-22.tar.gz -- Converted triples for the Stanford catalog (4 of 4) -- 4.8GB
...
These data files are available:
- 2016-03-17_cornell_ld4l_full_catalog_anno.tar -- Usage data for the Cornell catalog -- 478MB
- harvard.ld4l.full-catalog-1.2016-03-24_anno.tar -- Usage data for the Harvard catalog (1 of 4) -- 286MB
- harvard.ld4l.full-catalog-2.2016-03-22_anno.tar -- Usage data for the Harvard catalog (2 of 4) -- 272MB
- harvard.ld4l.full-catalog-3.2016-03-21_anno.tar -- Usage data for the Harvard catalog (3 of 4) -- 296MB
- harvard.ld4l.full-catalog-4.2016-03-22_anno.tar -- Usage data for the Harvard catalog (4 of 4) -- 239MB
...
These files are available:
- cornell_additional_triples.tar.gz -- Additional triples for the Cornell catalog -- 228MB
- harvard_additional_triples.tar.gz -- Additional triples for the Harvard catalog -- 217MB
- stanford_additional_triples.tar.gz -- Additional triples for the Stanford catalog -- 135MB
Linked data blobs
The linked data at draft.ld4l.org is served by a Sinatra application, reading from a MySQL database. The database looks like this:
Code Block | ||
---|---|---|
| ||
mysql> use ld4l;
Database changed
mysql> show tables;
+----------------+
| Tables_in_ld4l |
+----------------+
| lod |
+----------------+
mysql> describe lod;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| uri | varchar(200) | NO | PRI | NULL | |
| rdf | mediumblob | NO | | NULL | |
+-------+--------------+------+-----+---------+-------+ |
'uri' corresponds to the uri of the requested linked data:
Code Block | ||
---|---|---|
| ||
mysql> select uri from lod where uri like 'http%' limit 5;
+---------------------------------------------------------------------+
| uri |
+---------------------------------------------------------------------+
| http://draft.ld4l.org/ |
| http://draft.ld4l.org/cornell |
| http://draft.ld4l.org/cornell/n000000d1-1ab5-4fc0-a33f-2fb4204eddb4 |
| http://draft.ld4l.org/cornell/n000000e4-e1cd-4fbb-872c-01456a8a8396 |
| http://draft.ld4l.org/cornell/n00000118-f2b5-4005-b20e-65e09af06a2a |
+---------------------------------------------------------------------+ |
'rdf' is the data that will be served, in Turtle format, zipped. As such, it is not readable until unzipped:
Code Block | ||
---|---|---|
| ||
mysql> select substring(rdf, 1, 70) from lod where uri = "http://draft.ld4l.org/";
+------------------------------------------------------------------------+
| substring(rdf, 1, 70) |
+------------------------------------------------------------------------+
F?&D}??(I?R<p?st?6D?(?P?Dj3t?.?V
?N$!ϥsM?G?? |
+------------------------------------------------------------------------+ |
These dumps are available:
- schema.sql -- Re-creates the LOD table -- 2KB
- combined-lod-complete_2016-04-06.sql -- The full MySQL database, containing linked data for all three institutions -- 220GB
- cornell-lod-complete_2016-04-01.sql -- Linked data for just the Cornell catalog -- 63GB
- harvard-lod-complete_2016-04-02.sql -- Linked data for just the Harvard catalog -- 92GB
- stanford-lod-complete_2016-03-31.sql -- Linked data for just the Stanford catalog -- 67GB
Solr index capture
Search
Anchor | ||||
---|---|---|---|---|
|
The project developed an experimental search service based on the converted data above. The application code is available from github (https://github.com/ld4l/ld4l_blacklight_search) and is built on The application at search.ld4l.org is built on Blacklight. Blacklight is a Rails app that includes a Solr search engine .
...
The and the structure of the search index is determined both by by the Solr schema and and the Blacklight catalog controller script
These dumps are available:
- solr-data-complete_2016-04-14.tar.gz -- The full search index, containing data for all three institutions -- 103GB
Triple-store captures
The triple-stores used were instances of Virtuoso OpenSource 7 (taken from the develop branch). More specifically, the Virtuoso instances were built from this source:
Code Block | ||
---|---|---|
| ||
$ git remote -v
origin git://github.com/openlink/virtuoso-opensource.git (fetch)
origin git://github.com/openlink/virtuoso-opensource.git (push)
$ git status
On branch develop/7
Your branch is up-to-date with 'origin/develop/7'.
nothing to commit, working directory clean
$ git log -1
commit ea51ed3b81a43250ed2e3cfa77ee6e0116388b4b
Merge: 74a23e7 8ee2cfe
Author: VOS Maintainer
Date: Mon Mar 7 13:44:06 2016 +0100
Merge branch 'develop/6' into develop/7 |
...
These dumps capture the data directories of the three triple-stores:
...