MARC -> BIBFRAME Converter Framework

Considerable effort during the first year of the grant was spent architecting a MARC converter as a robust, well-tested, community-extensible tool for converting traditional cataloging formats to RDF and linked data. A multi-phase strategy for “deduping” of local URIs was designed to ensure that during bulk conversion only one local URI would be minted for each unique entity. Given the decision of many LD4L Labs and LD4P projects to use the bibliotek-o extension ontology rather than pure BIBFRAME 2.0, the converter was written to target BIBFRAME with the bibliotek-o extension ontology. The initial milestone for the converter, demonstrated in April 2017, was to generate bibliotek-o RDF for the “minimum viable MARC record,” consisting of the smallest number of fields constituting a valid MARC record. The initial framework was extended by the Harvard team to create two specialized converters in order to convert Harvard Film Archive XML records (hfa2lod) and Federal Geographic Data Committee (FGDC) records (fgdc2lod) to Linked Open Data. This extension work is described in FGDC & HFA to BIBFRAME Conversion.

Converter framework code: https://github.com/ld4l-labs/bib2lod

To complement the early converter development, and also to provide an objective approach for assessing other MARC -> BIBFRAME conversion tools, Stanford developed a validator framework that runs an automated test suite against the output of any given BF converter. Later development of the bib2lod converter (above) and the LC / Index Data marc2bibframe2 converter included built-in validation tests. This reduced the need for a general purpose validator framework and so further development of a more comprehensive set of validation tests was not pursued.

Validator framework: https://github.com/ld4l-labs/marc2rdf-validator

In a complementary effort, and to support its Tracer Bullet workflows in the LD4P project, Stanford developed a MARC → BIBFRAME conversion pipeline that took MARC record output from its ILS, used the Library of Congress BIBFRAME converter script, and loaded it into a Stanford-local triple store. A second phase of this work built a more robust conversion pipeline using Reactive programming, and employing a Kafka & Spark-based, high throughput, low latency converter.

YouTub Video: Reactive pipeline demo
Supporting code: Scala/Kafka/Spark Linked Data Pipeline (github repository)

Page tree

MARC -> BIBFRAME Converter Framework