...
Bibliographic information is information about books as opposed to the books themselves. A book's title, its cover image, and its ISBN are all bibliographic information--the text of the book is not. Bibliographic information flows into the circulation manager and metadata wrangler from a variety of sources, mainly OPDS feeds and proprietary APIs. We keep track of all this information and where it came from, and when necessary we weigh it, sort it, and boil it down into a small amount of information that can be used by other parts of the system.
DataSource
A DataSource
is some external entity that puts data into the system. This data generally falls into two categories:
...
- Provide many
Edition
s - Provide many
Equivalency
s - Provide many
Hyperlink
s - Provide many
Resource
s - Provide many
Classification
s - Provide many
CustomList
s - Grant access to many
LicensePool
s - Provide many
LicensePoolDeliveryMechanism
s - Generate many
CoverageRecord
s - Have many associated
Credential
s - Have one
IntegrationClient
Identifier
An Identifier
provides a way to uniquely refer to a particular book. Common types of Identifier
include ISBNs and proprietary IDs such as Overdrive or Bibliotheca IDs.
...
- Have many
Classification
s representing how the book would be shelved in a bookstore or library. (See the classification subsystem.) - Have many
Measurement
s of quantities like quality and popularity. (See the measurement subsystem.) - Have many
HyperLink
s to associated files such as cover images or descriptions. (See the linked resources subsystem.) - Participate in many
Equivalency
s. - Serve as the
primary_identifier
for multipleEdition
s. - Serve as the
identifier
for manyLicensePool
s, throughCollection
. - Be associated with one Work, through Edition
Equivalency
An Equivalency
is an assertion made by a DataSource
that two different Identifiers
refer to the same book.
- The
strength
of theEquivalency
is a number from -1 to 1 indicating how much we trust the assertion. When Overdrive says that an Overdrive ID is equivalent to an ISBN, we give thatEquivalency
astrength
of 1, because Overdrive got the ISBN from the publisher and assigned the Overdrive ID itself. When OCLC says that two ISBNs represent the same book, we give it a lowerstrength
, because OCLC is frequently wrong about this. A negativestrength
means that theDataSource
is pretty sure twoIdentifier
s represent differentbooks.
Edition
An Edition
is a collection of information about a book from a particular data source. Like most items in the "bibliographic metadata" section, it represents an opinion. If different data sources give conflicting information about a book, that's fine -- everyone has their opinion. When this happens, we create multiple Edition
s and we sort it out later, when it's time to make the presentation edition.
...
- Has one
DataSource
. This is the data source whose opinions are recorded in theEdition
. - Has one
Identifier
, theprimary_identifier
. This identifies the book the data source is talking about. - Contains basic metadata -- title, series, language, publisher, medium -- for that book.
- May have one or more
Contributor
s, throughContribution
. - May be the presentation edition for a specific
Work
. The presentation edition is a syntheticEdition
created by the system. We look over a bunch ofEdition
s which are all (supposedly) talking about the same book, and consolidate it into a newEdition
containing the best or most trusted metadata.
The contributor subsystem
This system basically tracks who wrote which book. There are two classes in this subsystem: Contributor
and Contribution
.
Contributor
A Contributor
is a human being or a corporate entity who is credited with work on some Edition
. The credit itself is kept in a Contribution
, which ties a Contributor
to an Edition
.
...
- Contains basic biographical information about a person or corporation. Most notably, it has both a
display_name
such as "Octavia Butler", the name that would go on the front of a book, and asort_name
such as "Butler, Octavia", the name that would go in a card catalog.
Contribution
A Contribution
:
- Links a
Contributor
to anEdition
. - Contains a
role
describing the work theContributor
did on theEdition
. Common roles include author, editor, translator, illustrator, and narrator.
The classification subsystem
This system tracks how a book might be classified in a card catalog or shelved in a bookstore. There are two classes in this subsystem: Subject
and Classification
.
Subject
A Subject
represents a classification that someone might give a book. Subject
handles a variety of classification schemes: Dewey Decimal, LLC, LCSH, BISAC, proprietary systems like Overdrive's, and free-form tags, among others. Four pieces of information might be derived from the Subject
, and will be stored with the Subject
if possible:
- Genre ("Billionare Romance" is a type of romance)
- Fiction/nonfiction status ("Science Fiction" is always fiction)
- Target audience ("Young Adult Fantasy" is always YA)
- Target age ("Picture books" are generally for very young children, not 12-year-olds.)
Classification
A Classification
is someone's opinion that a book should be filed under a certain Subject
.
...
- Links a
Subject
to anIdentifier
. - Has an associated
DataSource
-- this tracks whose opinion it is. - Has an associated
weight
representing how certain we are that the book should be filed under this subject. The higher the number, the more certain we are. If OCLC says that a single library has filed a certain book under "Whales", we'll record that information but give it a lowweight
. If OCLC says that ten thousand libraries have filed this book under "Whales", then it's probably about whales.
Genre
There are many different data sources which use many different classification schemes for the same books. Rather than expose this chaos to patrons, we have defined about 150 Genre
s, corresponding to the sections of a large bookstore or branch library: "Romance", "Biography", and so on.
...