Page History

Date: 18 Mar 2022

Attendees: Huda, Jason, StevenSimeon, SimeonGreg

Regrets: GregSteven, Lynette

Discovery (WP3)

https://github.com/LD4P/discovery/projects/2 for issues etc.
Draft of a discovery plan: https://docs.google.com/document/d/1zKYW7FQVVNvyd0XjjW0qWznX9PC3jbmOE6Kz_yygPjs/edit?usp=sharing
Research: how to go from knowledge graph to an index
- Research decision points, Use cases
BANG! (Bibliographic Aspects Newly GUI'd)
- Jamboard link
- Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
- References/bibliography list (beginning)
- BANG! preliminary design-ish/data questions link
- 2022-03-04
  - ** Huda will add notes re: 10K sample
    - Sampling method: Using the first 10,000 hubs returned from the id.loc.gov search API.
    - Overarching question: Can hubs provide relationships between catalog items?
    - Analysis: For every hub that has > 1 work, get ISBN groups. For groups which have > 1 ISBN, record how many times querying the Cornell catalog with those ISBNs provides > 1 catalog result.
    - Results:
      - Total: 202 sets of ISBNs where each set > 1 ISBN, comprising of a total of 750 unique ISBNs total.
      - Catalog matches: 26 ISBN groups which resulted in > 1 catalog result. These catalog matching ISBN groups comprise of 130 unique ISBNs.
  - Scripting hub-to-hub analysis, uni-directional. Almost done with process of getting ISBNs back... and will then query against catalog to allow us to say something like "out of 10K, there were ## ISBN groupings one can find using translation property of which ## yielded cataloging hit"
  - Since doing so many ajax queries, thinking about doing visualization
  - Need to go to LC Hub analysis – to look at LCCNs in addition to ISBNs
  - Hoping to be done with analysis in 2 weeks
  - Lots of presentation prep
  - To do: look at OCLC WorkId relationships to compare / identify groupings and results (in 2015 we had N number of clusters over the entire catalog).
- 2022-03-11
  - Had 200-300 people attend presentation on Wednesday, lots of questions (people fascinated by the infrastructure and our ability to make changes, FOLIO interest, role of cataloger in wikidata, fact vs contestable information, questions about mechanics)
  - Doing another presentation today (6th since January!), links on LD4 presentation page
  - BANG! work: Used 10,000 hub sample to look at relationships between hubs. Have added preliminary tables to data sources doc/report
    - Total: 110 ISBN sets related via hub to hub relationships, 22 result in >1 catalog matches (i.e. at least 2 catalog records related b/c two hubs are related)
      - 110 ISBN sets cover 381 ISBNs, 22 matching sets have 78 total ISBNs
      - By relationship: hasTranslation accounted for 92 of the total 110 ISBN sets, relatedTo accounted for 18
  - BANG! preliminary design-ish/data questions link
- 18
  - BANG!: Finished LCCN analysis for Hubs . Used the same 10,000 sample method as before
    - For aggregations by hub (i.e. multiple LCCNs all under the same hub)
      - 497 sets with > 1 LCCN where hub > 1 work, covering a total of 1840 LCCNs. These match to our Cornell catalog: 38 sets of LCCNs comprised of a total of 284 LCCNs.
    - For relationships where two hubs are related via property
      - 277 sets where two hubs are related and each set has > 1 LCCN, comprised of a total of 674 LCCNs.
        224 sets related via hasTranslation, 53 related through relatedTo
      - Matches: 33 sets comprised of 123 total.
        26 sets related via hasTranslation, 7 sets related via relatedTo
    - We seem to be getting more useful results from LC Hubs than SVDE/PCC data
    - Steven thinking about what these number suggest for catalog services; also want to compare with what we have from the old OCLC concordance file
  - Huda asked Michelle about doing analysis on Sinopia data. API has metrics systems that provides info on what facilities are being use
  - We note possibilities for query against POD data in VuFind instance that will be set of the BD (or local instance were we to build one). We could filter data and build an ISBN index of modest size for example
DAG Calls
- 2022-03-18 Good discussion of archival uses on 3/15. Much interest in "who did what? when?" type question and linked data that might support this. Looking at demos for concrete examples
DAG Calls
- 2022-03-11:
  Archives discovery will be presented on 3/15
Document started re: Comments, Questions and Suggestions offered during the myriad of presentations provided. Huda will add link here

...

Qa Sinopia Collaboration
- 2022-03-04
  - No meeting with Stanford this week.
  - homosaurus - Dave has in triple store. Steven defined context and validations. Lynette configured QA. Still getting graph read error, so I don't think the index has been built yet. I have a message out to Dave. Once that is done, it should just work. Final step is to configure in Sinopia
  - Dave still plans to try using LOC or Getty activity streams to update the cache. This is proof of concept. It may prove insufficient as none of the feeds include patches. But makes for a good exploration.
  - Dave still needs to fix of total_number_found to make pagination work to get pagination working again in Sinopia. Once that is done, it will just feed through to Sinopia without any additional work.
  - End of March, Huda and I will be meeting with OCLC to discuss the API and how it fits with our work.
  18 - Lynette on Samvera Valkyrization sprint
Best Practices for Authoritative Data working group (focus on Change Management)
- 2022-03-04
  - Meeting on Monday to review EMM Change Document API and Notifications Example
  - Updates in the recommendations document include steps for producers to create an activity stream for all 3 use cases. Will be looking for feedback on that at the next meeting.
  - I'm working to incorporate feedback given so far.
  - There is still a question about date handling and what we want to recommend. Options are endTime, startTime, published, updated. Once this is resolved, it will be fairly easy to expand the notifications examples out to the partial and full cache examples.
  - Feedback wanted via GH Issues
  18 - Lynette on Samvera Valkyrization sprint
Containerization
- 2022-03-04 Dave, Greg, and Lynette met to plan next steps for containerizing the cache.
- 2022-03-18:
  - Nothing new on cache containerization
  - Fixed cost tagging of our LD4P AWS resources. Will now try spinning up templates with new tags
  - AWS resources being used:
    - QA was scaled up and load testing looks good
    - What will we require for Dave? Have spun up tests but don't know eventual needs for caches; haven't yet looked at resources for cache build
  - Dave is exploring putting the war file in a container.
  - Dave handed some Lucene index files to host; he will get prototype working with his war file and those index files; will take from there.
  - Work on templates? Possibly some work can be done ahead of time.
  - Longer-term concern: generating the indexes... not containerizing the cache

Page tree

Versions Compared

Old Version 1

New Version 2

Key

Discovery (WP3)

Other Topics.