Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date:

Attendees:  Huda, Jason, Greg, Steve, Simeon

Regrets:  Lynette

Discovery (WP3)

  • https://github.com/LD4P/discovery/projects/2 for issues etc. 
  • Draft of a discovery plan: https://docs.google.com/document/d/1zKYW7FQVVNvyd0XjjW0qWznX9PC3jbmOE6Kz_yygPjs/edit?usp=sharing
  • Research: how to go from knowledge graph to an index
  • BANG! (Bibliographic Aspects Newly GUI'd)
    • Jamboard link
    • Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
    • References/bibliography list (beginning)
    • BANG! preliminary design-ish/data questions link
    • 2022-04-01
      • Huda looked at POD to find matched in 
      08
      • Steven thinking about design ideas based on cluster/work data, what different systems do with this sort of information. Working on slide deck, including idea from commercial sector
      • Sinopia prod data: Used Sinopia API to download JSON.  Each resource is represented as a JSON object with a data portion that is JSON-LD.  Extracted just the JSON-LD for the resources (3208 total) and added to a Fuseki dataset.  Began querying for work to work relationships.  Multiple work to work relationships but dwindles to only four when looking for where those works also have instances.
        • Next step: Do the same thing for Sinopia staging data.
      • OCLC Work Ids
        • Remembered we have an index! Used workid_facet to query the LD4P3 copy of the Cornell catalog to get all facet values and counts.
          • 884,868 Cornell records that have a work id value

          • Total number of work ids being used: 513,078

          • Number of work ids with only one match:317,970

          • Number of work ids with two or more match:195,108

            • Compare to LOC Hub ISBN analyses. Extrapolating to total number of hubs: 6000
      • Proof of concept for IMDB - Wikidata - Catalog example: Can we use Wikidata to get the IMDB URL and then get data from that IMDB URL to supplement info on the page
  • DAG Calls
    • 2022-04-08 - Three folks coming to talk about various topics and seed discussions about what worked and what didn't. Have also contracted U Ghent
    DAG Calls
    • 2022-03-25 Planning open office hour next week on Knowledge Panel white paper and group planning. On April 12 have plans to talk about challenges and opportunities of doing new discovery and LD work at their institutions, including how folks at institutions with fewer resources and do things, or what they need
  • Document started re: Comments, Questions and Suggestions offered during the myriad of presentations provided. Huda will add link here
    • Huda has some extra notes to add
    • Can perhaps refine this into topics/themes or clarification questions

Linked-Data Authority Support (WP2)

  • Qa Sinopia Collaboration
    • 2022-04-01
      • Two new direct access authorities (i.e. Bibbi and Norwegian thesaurus on genre and form) are ready to release in support of the Norwegian project that is using Sinopia. 
      • homosaurus - Dave says he plans to index today.  I'm hoping to include it in a release with the Norwegian authorities later today.
      • Dave still needs to fix of total_number_found to make pagination work to get pagination working again in Sinopia.  Once that is done, it will just feed through to Sinopia without any additional work.
      • Met with OCLC last Friday.  They were mostly interested in our use cases and pain points.
      08 - Lynette out
  • Best Practices for Authoritative Data working group (focus on Change Management) 
    • 2022-04-01
      • Met with Stanford devs to discuss updating cached labels in Sinopia.  We discussed why this is desirable, a bit about the entire ecosystem (e.g. Sinopia editing, conversion to MARC, generation of discovery index), and where all this is going in the long term.  One concern is that the current index for Sinopia doesn't store individual field values, so it would be hard to locate labels.  Dave described a project where he indexes jsonld in elastic search such that fields are available for search.  From that, Jeremy proposed that airflow could be a solution for processing activity streams and updating labels.  All of this feels like future planning since this work is not currently on the Sinopia work plan and very few authorities have activity streams at this time. (notes)
      • Met with David Newberry from Getty to discuss their activity streams.  We were looking at how they compared to LOC and how Dave E. can process them for the cache.  The document you land on for a given activity is very large with lots of external references.  This is going to be harder to process for updating the cache.  (notes)
      • Wrote up the process for consuming LOC in the the EMM Change Document API based on the discussion with Kevin at LOC.
      • Next regular full group meeting is Monday.
      08
      • Had meeting earlier in the week. Reflected on different styles of activity streams (Getty and LC). Similar complexities but different means of consuming them. Discussion of the difficulty of dealing with deletes
  • Containerization 
    • 2022-04-01
      • Containerizing the QaServer
        • -int and -stg images are built with the env variables that support customization. (LD4P/qa_server_container)
        • created cul-it/qa_server_aws_deploy repo to hold our customizations.  It is currently just a fork of the LD4P/qa_server_aws_deploy repo.  The plan is to put the set of authority files we support in this repo, and the footer customizations.  Explored a github action that will copy these customizations to our S3 which should be picked up automatically.  This provides a way for us to update authorities over time without having to manually copy them to S3.
        • Lifecycle questions still working on...
          • how to update out local deploy when the image changes?  This is proving tricky.  There is a way to have the actions that build the images announce the change using repository_dispatch, but so far, it looks like the only place that can process that is the LD4P/qa_server_container repo using a webhook.  What we want is all the downstream repos to be able to receive a message that they can process in a github action to update their deploys.
      • Containerizing the Cache Indices
    • 2022-04-08
      • Lynette and Robbie out this week
      • Greg push tagging changes through could formation, required 19 new permissions. Also worked in github actions to deal with separation of template and container repositories - have this 

Other Topics.

  • Sinolio - Sinopia-FOLIO
    • 2021-12-17 - Work Cycle finished, sprint video out
  • OCLC Linked Data / Entities Advisory Group
    • 2021-12-10 OCLC presented at bigheads meeting this week, in testing
  • PCC 
    • 2021-01-21 Definitions and non-RDA final report to POCO (hopefully) to be submitted next week
    • 2022-01-14 Nothing new to report.
  • Authorities in FOLIO
    • 2022-03-25 Some transitions in team. Useful meeting with Jenn, Frances, Nick, and Darcy to decide what needs to be provided to build queue. Mockups look good and allow filtering on types of change (new, deleted, updated). Quite different indexing requirements for data maintenance vs discovery

...