...
Looking at the whole data model at once can be overwhelming, so we'll consider it as a few smaller simpler systems:
- 187176996Bibliographic metadata187176996
- Licensing
- 187176996Works187176996
- Custom lists
- 187176996Libraries
- Patrons187176996
- Site configuration
- 187176996Background processes
These systems overlap around a few key classes, mainly DataSource
, Identifier
, LicensePool
, and Work
.
...
For the sake of simplicity, this document will talk about "books", but the rules are the same for audiobooks and other forms of content.
Anchor | ||||
---|---|---|---|---|
|
Bibliographic metadata
Bibliographic information is information about books as opposed to the books themselves. A book's title, its cover image, and its ISBN are all bibliographic information--the text of the book is not. Bibliographic information flows into the circulation manager and metadata wrangler from a variety of sources, mainly OPDS feeds and proprietary APIs. We keep track of all this information and where it came from, and when necessary we weigh it, sort it, and boil it down into a small amount of information that can be used by other parts of the system.
DataSource
A DataSource
is some external entity that puts data into the system. This data generally falls into two categories:
- Bibliographic information about a book, such as its title or cover image. This goes into the bibliographic metadata subsystem.
- Licensing information which can be used to serve actual copies of the book to library patrons. This goes into the licensing subsystem.
Some examples of DataSource
s:
- Overdrive, Bibliotheca, and Axis 360 license commercially published ebooks to libraries for delivery to patrons. They also provide bibliographic information about the books they license.
- Standard Ebooks provides bibliographic information about books, as well as free copies of the books themselves.
- OCLC and Content Cafe provide bibliographic information about books, but have no way of giving access to the actual books.
- VIAF provides information about the people who write books, but very little about the books themselves.
- The New York Times knows the ISBNs of the books on its best-seller lists, but not much more.
A
|
Identifier
An Identifier
provides a way to uniquely refer to a particular book. Common types of Identifier
include ISBNs and proprietary IDs such as Overdrive or Bibliotheca IDs.
An
|
Equivalency
An
|
Edition
An An
|
The contributor subsystem
This system basically tracks who wrote which book. There are two classes in this subsystem: Contributor
and Contribution
.
...
A Contributor
is a human being or a corporate entity who is credited with work on some Edition. The credit itself is kept in a Contribution
, which ties a Contributor
to an Edition
.
A Contains basic biographical information about a person or corporation. Most notably, it has both a |
Contribution
A Contribution
is piece of information contributed
|
The classification subsystem
...
A Subject
represents a classification that someone might give a book. Subject handles a variety of classification schemes: Dewey Decimal, LLC, LCSH, BISAC, proprietary systems like Overdrive's, and free-form tags, among others.
Four pieces of information might be derived from the
|
Classification
A Classification
is someone's opinion that a book should be filed under a certain Subject
.
A
|
Genre
There are many different data sources which use many different classification schemes for the same books. Rather than expose this chaos to patrons, we have defined about 150 Genre
s, corresponding to the sections of a large bookstore or branch library: "Romance", "Biography", and so on.
Each A |
Measurement
A Measurement
is a numeric value associated with an Identifier
.
It represents some quality that distinguishes one book from others. The most useful measurements are popularity (a popular book is read/accessed/purchased/accessioned more often) and rating (a highly rated book is considered to be of high quality). |
The linked resources subsystem
This system keeps track of external resources associated with a book. An "external resource" can be pretty much anything, but these are the most common types of resources we track:
...
A Hyperlink
represents a connection between an Identifier
and a Resource
.
It contains two extra pieces of information about the link:
|
Resource
A Resource
represents a document found somewhere on the Internet -- probably either a cover image or a free book. It has a url
, and that's basically it -- everything about the document itself is kept in Representation
.
|
Representation
A Representation
is a local cache of a Resource
. It represents our attempt to actually download a Resource
and records what happened when we tried.
If everything went well, the Circulation managers don't usually |
...
create An image |
Putting it all together
Here's how the whole subsystem works together. Let's say one of our data sources that claims the URL http://example.org/covers/my-book.png is a cover image for the ISBN "97812345678". We want to represent this fact in our system.
- We'd create an
Identifier
for the ISBN "97812345678". - We'd create a
Resource
forhttp://example.org/covers/my-book.png
- We'd create a
Hyperlink
with therel
"http://opds-spec.org/image", for "cover image". The.data_source
of thisHyperlink
would be set to theDataSource
that made the original claim. - We don't have to actually download http://example.org/covers/my-book.png, but if we do decide to download it, the binary image will be stored as a
Representation
. If there's a problem and we can't complete the download, that fact will be stored in theRepresentation
instead. - If we download the image and everything goes well, we may also decide to create a thumbnail out of it. This would be stored as a second
Representation
, and its.thumbnail_of
would point to the original, full-sizeRepresentation
.
ResourceTransformation
A ResourceTransformation
represents a change that was made to one Resource
to generate another Resource
.
Currently it's used in the circulation manager's "cover image upload" feature. You can upload a background image (the original Theoretically, thumbnailing could also be handled as a |
Anchor | ||||
---|---|---|---|---|
|
Licensing
Collection
A Collection
represents a set of books that are made available through one set of credentials.
...
- is associated with an Identifier, representing how the vendor identifies the book.
- is associated with a DataSource, representing the vendor who provides the book.
- belongs to one Collection.
- has one presentation edition, containing the most complete set of metadata available for the book.
- can have many Loans, Holds, Annotations, and Complaints
- can have many
CirculationEvents
. - should have at least one
DeliveryMechanism
, throughLicensePoolDeliveryMechanism
. - has a RightsStatus, through LicensePoolDeliveryMechanism.
DeliveryMechanism
...
and LicensePoolDeliveryMechanism
A DeliveryMechanism
describes what format a book is actually available in. There are two parts to a DeliveryMechanism
: 1) the DRM scheme implemented by the distributor, if any, and 2) the format of the book (EPUB, PDF, audiobook manifest, Kindle, and so on).
LicencePoolDeliveryMechanism
is a three-way join table: a record of a promise by a vendor (identified by a DataSource
) to deliver copies of a book (identified by an Identifier
) in a specific format (identified by a DeliveryMechanism
).
RightsStatus
A RightsStatus
represents the terms under which a book is being made available to patrons. The most common varieties of RightsStatus are 1) in copyright, 2) public domain, and 3) a Creative Commons license. "In copyright" implies that a book is being made available to patrons by virtue of a licensing agreement between the library and the vendor. The other RightsStatus
values imply that a book is being made available to library patrons on the same terms as it would be to the general public.
Complaint
Patrons may lodge one or more Complaints against a specific LicensePool. Complaints indicate problems with specific books. For example, a Patron can lodge a Complaint stating that a book is incorrectly categorized or described, or that there is a problem with checking it out, reading, or returning it.
CirculationEvent
A CirculationEvent
is a record of something happening to a LicensePool. A CirculationEvent
happens when an event takes place within the circulation manager (e.g. a work is checked out or placed on hold), or when we notice that an event happened on the distributor's side (such as licenses for a book being added or removed), or when a client app (i.e. a book having been opened).
...
A Work represents a book in general, as opposed to one specific edition of a book, or a specific licensing agreement to deliver copies of a book.
About Works | DB Schema |
---|---|
May have copies scattered across many | |
May have many Editions, but derives its presentation metadata from one particular Edition, which is known as its “presentation edition.” This special | |
Stores information that has been aggregated from multiple sources and summarized:
| |
May be referenced by multiple | |
May participate in many |
Anchor | ||||
---|---|---|---|---|
|
...
Anchor | ||||
---|---|---|---|---|
|
Library
A library represents some organization that serves a distinct set of patrons.
...
* one or more `Collections`.
* one or more `CustomLists`.
* one or more Lanes, each of which is associated with one CachedFeed.
* one or more Admins.
Admin
Admins are people such as librarians who have access to the admin interface (via accounts in the circulation manager). An Admin
is associated with a particular Library
through AdminRole
. An Admin may have more than one AdminRole
. The AdminRoles
are:
- Librarian
- SitewideLibrarian
- LibraryManager
- SitewideLibraryManager
- SystemAdmin
Lane
A library groups its books together using Lane
s. A Lane
may group books by any combination of these criteria:
...
A lane may have many CachedFeed
s.
CachedFeed
A CachedFeed
is a pregenerated OPDS document that's stored in the database to serve future client requests. If a CachedFeed
can be used, it greatly improves patron-visible response time.
...
Anchor | ||||
---|---|---|---|---|
|
ExternalIntegration
A ConfigurationSetting holds information about an extra piece of site configuration. A ConfigurationSetting may be associated with an ExternalIntegration, a Library, both, or neither.
ConfigurationSetting
An ExternalIntegration contains the configuration for connecting to a third-party API. Commonly used third-party APIs include the metadata wrangler, DataSources that require protocols, authentication services, storage services, and search providers.
Anchor | ||||
---|---|---|---|---|
|
Background processes
- A Timestamp provides a record of when a Monitor was run.
- A CoverageRecord provides a record of any processes that have been performed on a book (referred to via its Identifier)
- A WorkCoverageRecord provides a record of any processes that have been performed on a Work (similar to what CoverageRecord does for Identifiers).
...