Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Bibliographic information is information about books as opposed to the books themselves. A book's title, its cover image, and its ISBN are all bibliographic information--the text of the book is not. Bibliographic information flows into the circulation manager and metadata wrangler from a variety of sources, mainly OPDS feeds and proprietary APIs. We keep track of all this information and where it came from, and when necessary we weigh it, sort it, and boil it down into a small amount of information that can be used by other parts of the system.

DataSource

A DataSource is some external entity that puts data into the system. This data generally falls into two categories:

  • Bibliographic information about a book, such as its title or cover image. This goes into the bibliographic metadata subsystem.
  • Licensing information which can be used to serve actual copies of the book to library patrons. This goes into the licensing subsystem.

Some examples of DataSources:

  • Overdrive, Bibliotheca, and Axis 360 license commercially published ebooks to libraries for delivery to patrons. They also provide bibliographic information about the books they license.
  • Standard Ebooks provides bibliographic information about books, as well as free copies of the books themselves.
  • OCLC and Content Cafe provide bibliographic information about books, but have no way of giving access to the actual books.
  • VIAF provides information about the people who write books, but very little about the books themselves.
  • The New York Times knows the ISBNs of the books on its best-seller lists, but not much more.

A DataSource may also:

  • Provide many Editions
  • Provide many Equivalencys
  • Provide many Hyperlinks
  • Provide many Resources
  • Provide many Classifications
  • Provide many CustomLists
  • Grant access to many LicensePools
  • Provide many LicensePoolDeliveryMechanisms
  • Generate many CoverageRecords
  • Have many associated Credentials
  • Have one IntegrationClient


Identifier

An Identifier provides a way to uniquely refer to a particular book. Common types of Identifier include ISBNs and proprietary IDs such as Overdrive or Bibliotheca IDs.

An Identifier may:

  • Have many Classifications representing how the book would be shelved in a bookstore or library. (See the classification subsystem.)
  • Have many Measurements of quantities like quality and popularity. (See the measurement subsystem.)
  • Have many HyperLinks to associated files such as cover images or descriptions. (See the linked resources subsystem.)
  • Participate in many Equivalencys.
  • Serve as the primary_identifier for multiple Editions.
  • Serve as the identifier for many LicensePools, through Collection.
  • Be associated with one Work, through Edition

...

The linked resources subsystem

Image Added

This system keeps track of external resources associated with a book. An "external resource" can be pretty much anything, but these are the most common types of resources we track:

...

A Hyperlink represents a connection between an Identifier and a Resource.

It contains two extra pieces of information about the link:

  • A DataSource -- who provided this link?
  • rel -- what is the relationship between the Identifier and the Resource? "There's a link" is very vague; this is more specific. Different rel values are defined for a cover image, a thumbnail image, review, a description, a copy of the actual book, and so on.


Image Added


Resource

A Resource represents a document found somewhere on the Internet -- probably either a cover image or a free book. It has a url, and that's basically it -- everything about the document itself is kept in Representation.

  • A Resource that's an image may be chosen by an Edition as the best available cover image for a given book.
  • A Resource that's a textual description may be chosen by a Work as the best description for a given book.
  • A LicensePoolDeliveryMechanism for an open-access book will point to a Resource that represents the book itself.

Image Added


Representation

A Representation is a local cache of a Resource. It represents our attempt to actually download a Resource and records what happened when we tried.

If everything went well, the Representation will contain a file--binary, text, HTML, or image. Otherwise, the Representation will contain information about what went wrong -- maybe the server was down or something.

Circulation managers don't usually

...

create Representations -- they rely on the metadata wrangler to do that.

An image Representation that's a thumbnail of another image Representation is connected to its original through .thumbnail_of.

Image Added


Putting it all together

Here's how the whole subsystem works together. Let's say one of our data sources that claims the URL http://example.org/covers/my-book.png is a cover image for the ISBN "97812345678". We want to represent this fact in our system.

...

A ResourceTransformation represents a change that was made to one Resource to generate another Resource.

Currently it's used in the circulation manager's "cover image upload" feature. You can upload a background image (the original Resource) and paste the title and author onto it (a ResourceTransformation which results in a second Resource).

Theoretically, thumbnailing could also be handled as a ResourceTransformation, but it's probably not worth making this change.

Image Added

Anchor
Licensing
Licensing
Licensing

...

About WorksDB Schema

May have copies scattered across many LicensePools

May have many Editions, but derives its presentation metadata from one particular Edition, which is known as its “presentation edition.” This special Edition represents the best available bibliographic metadata for the book.

Stores information that has been aggregated from multiple sources and summarized:

  • Subject matter classification (aggregated from Classifications)
  • Intended audience (aggregated from Classifications)
  • Fiction/nonfiction status (aggregated from Classifications)
  • Popularity (aggregated from Measurements)
  • The best available summary (aggregated from Resources)

May be referenced by multiple CustomListEntries and/or CachedFeeds.

May participate in many WorkGenre assignments. WorkGenre is a simple join table that tracks the assignment of Works to Genres.


...

Anchor
Libraries
Libraries
Libraries

Library

A library represents some organization that serves a distinct set of patrons.

...

* one or more `Collections`.
* one or more `CustomLists`.  
* one or more Lanes, each of which is associated with one CachedFeed.
* one or more Admins.

Admin

Admins are people such as librarians who have access to the admin interface (via accounts in the circulation manager). An Admin is associated with a particular Library through AdminRole. An Admin may have more than one AdminRole. The AdminRoles are:

  • Librarian
  • SitewideLibrarian
  • LibraryManager
  • SitewideLibraryManager
  • SystemAdmin

Lane

A library groups its books together using Lanes. A Lane may group books by any combination of these criteria:

...

A lane may have many CachedFeeds.

CachedFeed

A CachedFeed is a pregenerated OPDS document that's stored in the database to serve future client requests. If a CachedFeed can be used, it greatly improves patron-visible response time.

...