At the beginning of the DASH! phase, we reviewed subject heading and author heading information.  An important step in exploring linked data integration is assessing where there may be points of intersection between the catalog and other linked data sources and how extensive these intersections may be.  Here is a link to a very informal jamboard trying to combine pieces of information.  This document provides a more targeted analysis.

Subject browse index

Description

Number

Solr query or filter

Authorized headings

1,720,035

authority=true

Authorized headings which are used directly for items

773,829

authority=true, mainEntry=true

Types and related counts for authorized headings

"Personal Name": 665,346

"Topical Term": 447,011,

"Corporate Name":327,014

"Geographic Name":179,784

"Work": 88,640

"Event": 6,868

"Genre/Form Term": 5,372

"Chronological Term":0

authority=true,

facet.field=headingTypeDesc

Types and related counts for authorized headings where headings are the main entry

"Topical Term":271,105

"Personal Name":246,717

"Corporate Name":124,621

"Geographic Name":98,274

"Work":29,505

"Genre/Form Term":1,976

"Event":1,631

"Chronological Term":0

authority=true,

mainEntiry=true,

facet.field=headingTypeDesc,

Author browse index

Description

Number

Solr query or filter

Authorized headings

4,999,524

authority=true

Authorized headings which are used directly for items

2,597,883

authority=true, mainEntry=true

Types and related counts for authorized headings

"Personal Name": 4,015,649

"Corporate Name": 851,940

"Event": 131,422

"Work": 247

"Topical Term": 175

"Geographic Name": 91

authority=true,

facet.field=headingTypeDesc

Types and related counts for authorized headings where headings are the main entry

"Personal Name": 2,183,578

"Corporate Name":353,640

"Event":60,380

"Topical Term":160

"Work":73

"Geographic Name":52

authority=true,

mainEntiry=true,

facet.field=headingTypeDesc,


In addition, we wanted to review how many catalog records may relate to a temporal or geographic subject heading.

DescriptionNumber and filters
Catalog records with geographic subject heading

3,220,876 (using field subject_geo_filing)


Catalog records with temporal subject heading959,900 (using field subject_era_filing)


(* Note the numbers above were retrieved 02/03/2022)

For our data sources, we wished to employ LCSH, LCNAF, Wikidata and PeriodO data that matched LCSH  headings. The following results were obtained by running queries against U of Iowa's Fuseki server which uses data downloads from LCSH.

  • LCSH temporal components
    • 17,561 subject headings with temporal component
    • 11,553 distinct temporal components (i.e. of type <http://www.loc.gov/mads/rdf/v1#Temporal> )
      • 34 distinct temporal components which are URIs and not blank nodes
  • LCSH geographic components
    • 87,542 subject headings with geographic component
    • 114,410 distinct geographic components (i.e. of type <http://www.loc.gov/mads/rdf/v1#Geographic>)
      • 48,677 distinct geographic components which are URIs and not blank nodes


Querying wikidata, we find the following matches to LC identifiers:

  • 1,340,603 distinct Wikidata entities which use an LOC identifier.  (We used the property P244).
    • 51,864 Wikidata entities pointing to an LOC identifier starting with "sh" (i.e. subject heading)
    • 1,289,158 Wikidata entities pointing to an LOC identifier starting with "n" (i.e. name headings)
  • 1,342,872 distinct LOC identifiers designated for Wikidata entities (i.e. this many unique identifiers are found in the object position for P244
    • 51,824 LOC identifiers which start with "sh" i.e. subject headings
    • 1,290,993 LOC identifiers which start with "n" i.e. name headings

(*Note wikidata results were retrieved 2/3/2022)


We also used LCSH mappings from PeriodO which contain 1478 total mappings (as of 2/3/2022).

  • No labels