2020-06-08 Meeting: Introductions and Brainstorming

Meeting Notes

Table of Contents

ACTION ITEMS:

Agenda

Introductions
Review of Working Group Charter and Logistics
Brainstorming
Review documents completed prior to this meeting: Current Approaches to Providing Search APIs and Needs of Consumers and Applications

Meeting Materials

Notes

Introductions

Name, Organization, 1-2 sentences on what you hope to get from the group. Introductions need to remain short to avoid having them take up the majority of the meeting. Please share the critical aspects that draw you to participate in the group.

E. Lynette Rayle (Cornell) - Two pain points: 1) Differing methods for accessing authorities requiring one-off coding, 2) Getting back enough information to be able to display results to end users in a meaningful way.
Justin Littman (Stanford) - Desire to code as little as possible and be able to access authorities.
Christine Fernsebner Eslao (Harvard) - Continue with work begun in LD4P2. Would like to see various lookups standardized with a goal of better usability. Would like to see reconciliation automated.
Nancy Fallgren (NIH) - Hoping to create more authorities. Want to make more available through RDF. Want them to be usable.
Tiziana Possemato (Casalini) - Make connections with other groups and be on positive terms with others. My invite technician to join when needed.
Kirk Hess (OCLC) - Worked in research previously with LOC. Worked on high availability and throughput. Looking to bring experience with LOC and new connections with OCLC.
Jeremy Nelson (Stanford) - Works on Sinopia Editor. Hopes to minimize code and interested in seeing how this supports a diverse set of authorities.
Steven Folsom (Cornell) - Coordinator of metadata design; As a consumer, want any tool creating metadata to be able to tie into everything you want where data looks similar and easier to co-index.
Kevin Ford (LOC) - Has served in roles as a producer, publisher, and consumer.
John Graybeal (Bioportal): consumer (Cedar), producer (Bioportal); Cedar is a consumer of Bioportal
John Chapman (OCLC): entity management infrastructure project; producer/consumer; possible implications for VIAF or other projects.
Rob Sanderson (currently independent, Yale in September): vocabulary management (currently working Arches); integrations that are not system-specific
Lydia Pintscher (Wikimedia Deutschland): product management for Wikidata. Here to address questions about wikipedia.

Aspects of interest generally fall into these categories:

Publishers want to make consumption easier
Consumers want meaningful display and search of data
Software developers and engineers want lighter/more maintainable code.

Working Group Charter and Logistics

Reference: Working Group Charter

What does it mean for linked data to be an "approach" rather than a part of a system?
- Having a REST API return LOD
Sanderson: Are we concerned with update/delete/write/etc?
- Rayle: Those are part of the larger picture, but for the purposes of this group's deliverables, we are focused on search and return of data. Everything is fair game in brainstorming, and a subsequent working group is possible.
Sanderson: Previous attempts at standardizing search APIs have failed. We need to be very specific about details.
- Rayle: Add links to outside documentation under "References".

Logistics:

access to Slack channel
access to Wiki
note takers

Brainstorming

Some terms used in the brainstorming...

Term	Description
Reconciliation	Things-to-Things - The process of identifying that two things represented by different URIs are actually the same thing.
Entity Resolution	Strings-to-Things - The process of identifying that a String Label is the label for a thing identified by a URI.
Caching	¹ Storing local copies of data. Either a full cache of all data ² Caching of a single label or small pieces of data about a single term.
Accuracy	¹ The ability of an API to return relevant data ² The ability of a user to select a term from multiple similar terms given a set of search results

All topics related to accessing authoritative data. This can include topics that are not directly related to search APIs. Topics of interest to the group, but not directly related to search APIs will be considered for the tail end of the working group if there is time or considered for a new working group if there is enough interest. So let all your thoughts flow.

Caching ¹
- Downloads - cache management synchronization
Caching ²
- Notification of updates when entity descriptions change, or at least ability to search by dates/types of changes
- Deprecations - mechanism to state that this term is replaced by another term - how do end users know that the term is no longer valid
Reconciliation
- entity reconciliation
Accuracy ¹
- identify you have the authority you want - got the right john smith
- How do humans choose between two similar but distinct entities?
Accuracy ¹
- right information to search and display - need an easy way to define
- moving to linked data - which attributes to include for each entity
- May need more context some times
- Rank ordering so results are displayed with first results as best result
- Listed alphabetically
- Option for left anchored search
- Which labels to display when multiple labels - across languages and scripts and kinds of names
- Can users personalize how the data comes back?
API approach
- API - focus first on retrieval by REST before search and browse
- browsing with context when know what you are looking for and a good amount of time, catalogers know what they are looking for
- searching to discover when you don't know what you are looking for
- One service doesn't fit all needs. Suggest, Search, Browse - each serve different needs
Data Related
- Which data elements are unique or intended to be unique?
- Which data elements are intended for end users (such as library users browsing a collection) and which are intended primarily for internal use
- Suggest SKOS ontology (Avoid talking about modeling for years) and
- Suggest JSON-LD for format
Dealing with errors
- more consistent and granular error reporting to determine the source of the error
Other
- indexing - connections of relationships between entities
- suggest 2.0 document
- Learning from users
Use cases
Versioning of authorities - and how is it surfaced through API
Learning how something has changed and knowing what to do in response
Rank ordering - being able to manage and choose the context of your query where context is provenance, community recommendations, relationships, etc
Presence or absence of connections to other entities; making choices based on interconnectedness and fullness of data
what is the impact on APIs with respect to deprecations
reconciliation - reconcile more than just adjunct works and references to outside data and across languages
concerns with sameas connections between data that may not be accurate
reconciliation - why isn't this just the openrefine API; implemented and well understood
- responsivity for autocomplete
to enable local authorities to participate by implementing the API - allow local to define a narrower term to a broader
- discovery of (and enrollment in) new authorities
handling different data models (the organization of the returned data handling different syntaxes
server side pagination vs client side
pagination: expanding set of results; turning off pagination to get all results
standards-based

Review documents completed prior to this meeting

Reference: Current Approaches to Providing Search APIs

Reference: Needs of Consumers and Applications

Attendees

Kevin Ford (LOC)
Kirk Hess (OCLC)
Rick Bennett (OCLC)
John Chapman (OCLC)
Nancy Fallgren (NIH)
John Graybeal (Bioportal)
Tiziana Possemato (Casalini)
Lydia Pintscher (Wikidata)
Hetty van Zutphen (ISNI)
Steven Folsom (Cornell)
Christine Fernsebner Eslao (Harvard)
Jeremy Nelson (Stanford)
Justin Littman (Stanford)
E. Lynette Rayle (Cornell)
Rob Sanderson (independent)

Absent

Nate Trail (LOC)
Jens Ohlig (Wikidata)

Page tree