Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Collaboration will be key in this phase of LD4P, both internal and external. The partners will be collaborating on the development of the cloud environment and Blacklight, the Library of Congress, and the Program for Cooperative Cataloging will collaborate with the project through training in the use of the BIBFRAME Editor, Harvard will foster a close relationship with the PCC in the development of policy decisions, and the core institutions will collaborate with Wikidata in the publishing, linking, and enriching linked data through the Wikimedian-In-Residence program.

Background/Context and Rationale

LD4P (Phases 1 and 2) is focused on the transition of basic Technical Services workflows from their current infrastructure built upon MARC to linked open data and the Web.  The importance of the transition away from MARC to linked data can be elucidated by two questions.

First, why not continue with MARC? MARC was a revolution in its day.  It allowed data from library card catalogs to be encoded in machine readable form, enabling the catalog cards to be reproducible on the computer screen and the data to be exchanged freely among libraries.  It is a fifty-year-old technology, however, originally designed for magnetic tape-based computers, and now only understood by library systems.  In addition, the MARC formats are semantically inexpressive and have isolated libraries from the development of the Web.

And second, why linked data?  It is apparent that library patrons have preferred searching for information on the Web for quite some time.  By integrating library data into the Web in a semantic way, our patrons can find well-formed library data there as well as in library catalogs.  By taking advantage of the semantic web, library patrons can directly benefit from other important data sources on the Web.  A third advantage is that the Web is an international environment.  By shifting to linked data, libraries worldwide can take advantage of the bibliographic and authoritative data many national libraries create and make available now as linked data.  And last, the Web is a continually evolving environment.  Without a doubt, linked data will evolve into some other standard with time. But in order to move along with this evolution, libraries will need to make that first important step in the transition to a Web environment.

Current Work: Linked Data for Production Phase 1 and LD4L Labs

 

In Linked Data for Production Phase 1, the partners proposed the development of a communal work environment based in linked data; the strengthening and expansion of the BIBFRAME ontology to cover the multiple formats (e.g., books, music, maps, etc.) that libraries must catalog; the tools needed to perform the work itself; and the development of lightweight workflows (Tracer Bullets) to prove that the transition to linked data was both possible and practical. In the LD4L-Labs work, the partners piloted the development of an editing tool to support cataloging using BIBFRAME and variations, together with selected extension ontologies; the integration of linked data authority lookup and management into cataloging; and the use of linked data for discovery and visualization.

 

Communal Work Environment:The partners were fortunate for the development of a communal work environment in support of the project called SHARE-VDE, or the SHARE-Virtual Discovery Environment (http://share-vde.org/sharevde/). The achievement of the SHARE-Virtual Discovery Environment project is based on the partnership between Casalini Libri and a software development company called @Cult along with the cooperation of academic library partners such as the University of California Berkeley, Stanford, and Duke. The environment includes a semantically enhanced MARC to BIBFRAME converter allowing not only for the simple conversion of MARC fields to BIBFRAME but the extraction of additional free text data from MARC fields (such as role) and its conversion to linkable identifiers. It also includes an advanced reconciliation system that goes beyond the use of simple text-string matching to equate two entities, by gathering multiple points of information about an entity (such as birth date, co-editors, institutional affiliation, etc.) so that matches can be made with confidence even if the text strings are not identical. The availability of this environment will promote the reuse of metadata in the second phase of Linked Data for Production.

 

BIBFRAME Ontology:The Library of Congress (LC) has been very open to working with LD4P in the refinement and expansion of the BIBFRAME ontology over the past year. The partners proposed a number of changes to BIBFRAME, some of which have been incorporated in the current version, and some that may be adopted in future versions. In addition, the partners have made extensions to BIBFRAME in the areas of performed music, rare books, art, and cartographic materials.  These extensions to BIBFRAME willallow libraries to be able to catalog all materials passing through their traditional workflows. 

Tooling:Tooling will be key to any linked data transition, and one of the most important tools will be an “editor” to both create and edit data.  LD4P and LD4L Labs are currently experimenting with three linked-data editors. The first is the LC BIBFRAME editor.  This editor is receiving a thorough shake-down as LC prepares to train over seventy staff to use it in linked data creation.  The second is a new editor being developed at the Biomedical Informatics Research Lab at Stanford calledCEDAR.

 

Last, during the LD4L-Labs grant, the partners began exploring how to extend and customize Vitro (the platform behind VIVO, seehttps://github.com/vivo-project/Vitro) to serve as a cataloging editor (called VitroLib). VitroLib can support editing using BIBFRAME as well as extension and variation ontologies. While the partners do not expect to directly use the VitroLib editing environment in LD4P Phase 2, the user feedback from catalogers in the use of all three editors in such areas as UI development, transition to a recordless environment, and the use of multiple ontologies will significantly inform the creation of the planned communal work environment.

 

Workflows:In LD4P Phase 1, Stanford focused on the conversion of four key workflows to a linked data strategy: two related to the traditional ILS (copy cataloging and original cataloging) and two to the digital repository (deposit of a single item through self-deposit and the deposit of a collection of items through bulk loading).The ILS workflows have been established in a lightweight fashion and tested, and we are now ready to expand them both in depth and in participants for the next phase of LD4P.  The expansion of these workflows will be the next critical development as they cover the predominate resources libraries must handle in their day-to-day production.

...

 

 

Rationale

 

The focus of the second phase of LD4P is implementation.  Building Building upon the expertise, structure, and workflows developed during the first phase of LD4P and LD4L-Labs, the four partners (Cornell, Harvard, Stanford, and the University of Iowa) will implement a prototype environment, from metadata acquisition/creation through to discovery.  An An important enhancement in this phase will be collaborating with the Program for Cooperative Cataloging (PCC) and the Library of Congress to expand the number of libraries moving to implementation of linked data.  SubSub-grants for committed libraries will help them defray transition costs. 

 

The choice of working with the PCC was deliberate.  Within Within the United States, we work within the concept of a virtual, distributed “national library” for the creation of high-level metadata.  The The PCC provides the community with a forum for the development of policy and training programs for member libraries.  The The full buy-in of the PCC, along with their ability to provide training and support, will be key to expanding the transition to linked data from the core libraries within LD4P to the broader academic library community.

 

Discovery will also be a key development in LD4P Phase 2. By focusing on linked-data enhancements to current discovery systems such as Blacklight, LD4P hopes to take immediate advantageof advantage of linked data for library patrons through such developments as the addition of knowledge panels, authority-based browse, and semantic search (see Work Package 4: Discovery).

 

...

Activities