facilitator: Steven Folsom
- Requires a mix of automated and manual methods
- Need tools to do this, e.g. present user with automated matches and allow them to make changes (this could then be used to tune the algorithm)
- There's a potential to open this up to communities beyond library professionals (crowd-sourcing/niche-sourcing)
DPLA: placename resolution
- matching against Geonames
- staff discomfort
- lack subject expertise in aggregated data
- use entire record as context for resolution
- points vs. shapes in geo entity resolution
- crowdsourcing opportunity?
- OCLC - several passes through data, information from multiple sources (ISNI, VIAF, etc.)
- need public feedback for last 20%
- refine algorithms based on crowdsourcing feedback
- machine transformation and confidence rating – mark that is machine-generated, with date
strings --> things
- need string info in perpetuity
- accuracy, testability of ambiguity
- places ... think maps ...
- dates ... map interface
libraries divide and conquer entity cataloging
- human mediation
- less human mediation
- hybrid models – e.g., obit project
- akin to OCR post-processing
- page rank algorithm
- BibFrame converter – work accuracy?
- from metadata – how structured is it?
- lots of text – algorithms better
how motivate users to take tools/data for a spin?
what if we had no metadata and started only with full text?
- solutions – would be awesome
parsing MARC to find translaters and role
- roles as strings should be things
- requires human review
- resolve ambiguity in identity, roles, contributions
- predicates restrict detail
- e.g., performer vs. violinist
- simple problems or too complex, requires experts?
UCSD – mix of auto & manual review
CERL – name, spelling & disambiguation
HBS – URIs provided by authority vendor
Create local auth record/URI for strings with no auth?
Feed into LC or OCLC for needed authorities?
Improve cataloging tools with type-ahead entity resolution