View Source

Date: 04 Mar 2022

Attendees: Huda, Jason, Lynette, Steven

Regrets: Simeon

Discovery (WP3)

https://github.com/LD4P/discovery/projects/2 for issues etc.
Draft of a discovery plan: https://docs.google.com/document/d/1zKYW7FQVVNvyd0XjjW0qWznX9PC3jbmOE6Kz_yygPjs/edit?usp=sharing
Research: how to go from knowledge graph to an index
- Research decision points, Use cases
BANG! (Bibliographic Aspects Newly GUI'd)
- Jamboard link
- Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
- 2022-02-18
  - References/bibliography list (beginning)
    - Plan: Continue reviewing and looking at references
  - Huda added a "feeling lucky" page to fuseki UI which picks random statements to show so people can start exploring. Needs to send info to/show to Michelle (Stanford), and ask Jim Hahn (Penn) if interested.
    - Plan: Add click functionality on the graph to generate a side panel with details for that node/entity. May be easier to understand than spaghetti expansion
    - Steven mentioned classes/predicates summary to Dave
  - Hubs analysis: Huda changed approach after discussing with Steven what we're trying to explore with this analysis.
    - Main question for ShareVDE and Hubs aggregation: Can this data yield relationships between/groupings of works that yield related items in the catalog? We want to extract sets of ISBNs grouped under an aggregation or under a property between works, and then see if any of those sets yield at least two catalog matches (i.e. translate into relationships between items in the catalog)
    - Approach for ShareVDE: Evaluate how many opera there are with at least two works and at least two related ISBNs, and how many of these ISBN groups yield at least two catalog matches. (Only one catalog match means that, if we were on that item page, we would not have any related items to see using this data).
      - Steven notes that a link via Hub from an ISBN we hold to one we don't hold is a possible ILL use case
    - Now for Hubs: Get sample of hubs from LOC search. Changing start parameter to page through list to work around LOC side throttling. For each hub, see if ISBN set can be generated. For each set with > 1 ISBN, determine if there > 1 catalog matches.
      - Plan: Continue to do so. Current results: For 4000 hubs from LOC, 87 sets of ISBNs (with > 1 ISBN) where hub has > 1 work = 367 unique ISBNs => catalog matches where you have at least two catalog items for that ISBN set: 12 ISBN sets yield matches for total of 73 ISBNs
- 2022-03-04
  - ** Huda will add notes re: 10K sample
    - Sampling method: Using the first 10,000 hubs returned from the id.loc.gov search API.
    - Overarching question: Can hubs provide relationships between catalog items?
    - Analysis: For every hub that has > 1 work, get ISBN groups. For groups which have > 1 ISBN, record how many times querying the Cornell catalog with those ISBNs provides > 1 catalog result.
    - Results:
      - Total: 202 sets of ISBNs where each set > 1 ISBN, comprising of a total of 750 unique ISBNs total.
      - Catalog matches: 26 ISBN groups which resulted in > 1 catalog result. These catalog matching ISBN groups comprise of 130 unique ISBNs.
  - Scripting hub-to-hub analysis, uni-directional. Almost done with process of getting ISBNs back... and will then query against catalog to allow us to say something like "out of 10K, there were ## ISBN groupings one can find using translation property of which ## yielded cataloging hit"
  - Since doing so many ajax queries, thinking about doing visualization
  - Need to go to LC Hub analysis – to look at LCCNs in addition to ISBNs
  - Hoping to be done with analysis in 2 weeks
  - Lots of presentation prep
  - To do: look at OCLC WorkId relationships to compare / identify groupings and results (in 2015 we had N number of clusters over the entire catalog).
DAG Calls
- 2022-02-18
  - Had first crossover call with WAG group focusing on DASH! and usability testing. Some questions to follow up on, report of our page selecting one citizenship statement where wikipedia has multiple statements that might be a bug
  - Next week will talk more about this work and also about getting and using feedback from use reps etc.
- 2022-03-04:
  - PCC Sinopia Affinity Group call last week (overview of examples from both LD4P2 and LD4P3). (Notes/Recording). Next week have CORE LD Interest Group & Catalog Form & Function interest group
  - Archives discovery will be presented on 3/15
  - LD4 Steering sent out questionnaire to Affinity Groups asking what support they can provide... Huda and Astrid will respond on behalf of DAG
Document started re: Comments, Questions and Suggestions offered during the myriad of presentations provided. Huda will add link here

Linked-Data Authority Support (WP2)

Qa Sinopia Collaboration
- 2022-03-04
  - No meeting with Stanford this week.
  - homosaurus - Dave has in triple store. Steven defined context and validations. Lynette configured QA. Still getting graph read error, so I don't think the index has been built yet. I have a message out to Dave. Once that is done, it should just work. Final step is to configure in Sinopia
  - Dave still plans to try using LOC or Getty activity streams to update the cache. This is proof of concept. It may prove insufficient as none of the feeds include patches. But makes for a good exploration.
  - Dave still needs to fix of total_number_found to make pagination work to get pagination working again in Sinopia. Once that is done, it will just feed through to Sinopia without any additional work.
  - End of March, Huda and I will be meeting with OCLC to discuss the API and how it fits with our work.
Best Practices for Authoritative Data working group (focus on Change Management)
- 2022-03-04
  - Meeting on Monday to review EMM Change Document API and Notifications Example
  - Updates in the recommendations document include steps for producers to create an activity stream for all 3 use cases. Will be looking for feedback on that at the next meeting.
  - I'm working to incorporate feedback given so far.
  - There is still a question about date handling and what we want to recommend. Options are endTime, startTime, published, updated. Once this is resolved, it will be fairly easy to expand the notifications examples out to the partial and full cache examples.
Containerization
- 2022-03-04
  - Dave, Greg, and Lynette met to plan next steps for containerizing the cache.
  - Dave is exploring putting the war file in a container.

Discovery (WP3)

Linked-Data Authority Support (WP2)

Other Topics.

Upcoming meetings/presentations

Next Meeting(s), anyone out?: