This page provides additional details around the integration of information from Wikidata, DBPedia, and the Library of Congress (LOC) linked data service into the Cornell production catalog. We began this work as part of LD4P2 and LD4P3 discovery explorations around knowledge panels and author and subject pages. After multiple discussions with our library user representatives and modifications based on their feedback, we coordinated with the Discovery and Access team, the group responsible for our catalog implementation, to move forward with this integration.
Examples and screenshots
On the page for this Middlemarch item, the user can click on "author info" to open up the knowledge panel for George Eliot.
Clicking on "view full info" will lead to the author page for George Eliot, as you can see below.
Users can also reach this author page through our author browse list here:
Similarly, users can reach the subject page for a heading from the subject browse list (as shown below). We are not currently implementing separate subject info buttons from the item view page. Clicking on the "subject info" button next to the heading "Russo-Japanese War" will take the user to the subject page.
- Knowledge panels
- We now include an image and short description from Wikidata.
- Author and subject browse pages:
- We now include
- An image from Wikidata
- A description from DBPedia (or a description from Wikidata if the former is not available)
- Additional Wikidata information for people such as citizenship, education, and pseudonyms
- The LOC classification number which is used to generate a link to our call number browse.
- We now include
All of the data on the page is retrieved using dynamic lookups against external data sources to retrieve information. We also check against our configuration to see which data or authorities should show information from external sources. (This process is described in greater detail below). Currently, the author knowledge panel and author and subject pages rely on a string lookup of the authorized heading string against the id.loc.gov suggest service. This lookup returns a URI which we then use to retrieve LOC information such as the classification number and to query Wikidata. DBPedia queries use a combination of Wikidata QID and label searches to retrieve descriptions. Additionally, we have requested the inclusion of LOC identifiers or URIs within the catalog Solr indices and, in the future, we hope to be able to use that information directly instead of relying on string lookups to get LOC URIs .
The image below shows how information is being retrieved for the heading for George Eliot. Blue arrows represent queries or JSON retrieval. The gray boxes include the string heading or the identifiers. The image displays the local names for the URIs to make the image easier to read. The white boxes with black borders underneath the URIs show the properties we are retrieving for that URI.
When we don't want to show specific data
We implemented a system whereby we can exclude information from being displayed for specific authors or subjects. Using a YAML file, we can designate whether we wish to hide all external data or whether we wish to hide certain properties for a specific author or subject heading.
For example, if we want to not display the image and description for the heading "Eliot, George, 1819-1880" but wish to display the remaining properties we retrieve from Wikidata, such as citizenship and place of education, we would include the following in the YAML file:
If we did not want to show any of the information we retrieve from Wikidata or DBPedia for this specific heading, we would instead write the following in the YAML file:
What happens when external data services are not available
If the external lookups or APIs are not functioning, we try to ensure the knowledge panel and author and subject pages still load with the information from our own browse indices.
We also updated aspects of the design of both the author knowledge panel and the subject and author pages. We incorporated library holdings search information into the author and subject pages to allow users to more easily find related information and resources. Our author and subject browse indices already included the "total works by" and "total works about" information. In order to ensure that the library holdings information search results reflected works by and about the authors, we employed an advanced search using a boolean OR to query for results where the heading was the author or the subject (using the same author and subject fields search fields that are used for the links for "total works by" and "total works about" links).
Related sections of code include:
- Knowledge panel
- Author and subject pages
- Configuration for what not to display for specific headings
- Related additions to routes.rb
Additional information about specific code changes can be found in the Discovery and Access documentation for this work here.
Thanks and acknowledgements
Special thanks to Tim Worrall, our lead developer on this work. Many thanks as well to our library catalog user representatives and to the entire Discovery and Access team. Thanks to Frances Webb who provided insight into our Solr indices and to Melissa Wallace who provided feedback around design.
On the LD4P3 front, thanks also to our discovery team (Discovery On the Ground), including but not limited to: Astrid Usong for UX work and contributions, Greg Delisle for server infrastructure, Steven Folsom and Jason Kovari for metadata feedback, Tim Worrall for development and design work, Michelle Futornick for contributing to discussions, Hilary Thorsen for her query pointers, and Dave Eichmann for his data indexing work and providing data query support. Thanks also to Lynette Rayle for her support as a colleague and friend. Thanks to Kevin Ford for answering our LOC data questions. As always, thanks to our LD4P3 project directors: Jason Kovari, Simeon Warner, Tom Cramer, Dave Eichmann, Phil Schreur.