Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date:

Attendees:  Huda, Jason, Greg, Lynette, Simeon

Regrets:  Steven

Discovery (WP3)

  • https://github.com/LD4P/discovery/projects/2 for issues etc. 
  • Draft of a discovery plan: https://docs.google.com/document/d/1zKYW7FQVVNvyd0XjjW0qWznX9PC3jbmOE6Kz_yygPjs/edit?usp=sharing
  • Research: how to go from knowledge graph to an index
  • BANG! (Bibliographic Aspects Newly GUI'd)
    • Jamboard link
    • Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
    • References/bibliography list (beginning)
    • BANG! preliminary design-ish/data questions link
    • 2022-03-18
      • BANG!:  Finished LCCN analysis for Hubs . Used the same 10,000 sample method as before
        • For aggregations by hub (i.e. multiple LCCNs all under the same hub)
          • 497 sets with > 1 LCCN where hub > 1 work, covering a total of 1840 LCCNs.  These match to our Cornell catalog: 38 sets of LCCNs comprised of a total of 284 LCCNs.
        • For relationships where two hubs are related via property
          • 277 sets where two hubs are related and each set has > 1 LCCN, comprised of a total of 674 LCCNs.
            • 224 sets related via hasTranslation, 53 related through relatedTo
          • Matches: 33 sets comprised of 123 total.
            • 26 sets related via hasTranslation, 7 sets related via relatedTo
        • We seem to be getting more useful results from LC Hubs than SVDE/PCC data
        • Steven thinking about what these number suggest for catalog services; also want to compare with what we have from the old OCLC concordance file
      • Huda asked Michelle about doing analysis on Sinopia data. API has metrics systems that provides info on what facilities are being use
      • We note possibilities for query against POD data in VuFind instance that will be set of the BD (or local instance were we to build one). We could filter data and build an ISBN index of modest size for example
      04-01
      • Huda looked at POD to find matched in 
      2022-03-25
      • Huda has looked at finding related works by ISBN in other IPLC libraries using POD data, thinking about how this could be tied to easier BD request functionality
      • Huda looking at tidying and writing up scripts from analysis work ...
      • Steven has asked Jeff Mixter what we can find from OCLC about versions of same thing across their very broad data
      • Huda hasn't looked at Sinopia APi yet. Nancy has commented that Instance template is most used
      • PLAN: Wrap up analysis next week and Huda/Steven will give run-through at meeting, then look at literature and decide what to implement, do prototype in May, test in June
      • Inventory of Work-like features in Discovery - SF will add inventory of Requests features
  • DAG Calls
    • 2022-03-25 Planning open office hour next week on Knowledge Panel white paper and group planning. On April 12 have plans to talk about challenges and opportunities of doing new discovery and LD work at their institutions, including how folks at institutions with fewer resources and do things, or what they need
  • Document started re: Comments, Questions and Suggestions offered during the myriad of presentations provided. Huda will add link here

...

  • Qa Sinopia Collaboration
    • 2022-0304-25 01
      • Two new direct access authorities (i.e. Bibbi and Norwegian thesaurus on genre and form) are ready to release in support of the Norwegian project that is using Sinopia. 
      • homosaurus - Dave says he plans to index today.  I'm hoping to include it in a release with the Norwegian authorities later today
      • No meeting with Stanford this week.  Much of the Stanford team had conflicts.
      • Met with Greg and Dave on Containerization.  More on that below.
      • homosaurus - No change.  Dave has in triple store.  Steven defined context and validations. Lynette configured QA.  Waiting on Dave to index and put up the search API endpoint.
      • Dave still needs to fix of total_number_found to make pagination work to get pagination working again in Sinopia.  Once that is done, it will just feed through to Sinopia without any additional work.
      • Meeting with OCLC this afternoon to discuss collaboration and our usage.
      • National Library of Norway has requested some additional authorities, one has dataset, other has APIMet with OCLC last Friday.  They were mostly interested in our use cases and pain points.
  • Best Practices for Authoritative Data working group (focus on Change Management) 
    • 2022-0304-2501
      • Met with Kevin at LOC to talk through the LOC activity stream and how Dave can use it to do an incremental update.  There is a path forward for this.  (notes)
      • Scheduled to meet next Tues afternoon with Getty to look at how it can be used for incremental update.
      • Scheduled to meet next Thurs afternoon with Sinopia team to strategize how to update cached labels in the Sinopia app.
      • Stanford devs to discuss updating cached labels in Sinopia.  We discussed why this is desirable, a bit about the entire ecosystem (e.g. Sinopia editing, conversion to MARC, generation of discovery index), and where all this is going in the long term.  One concern is that the current index for Sinopia doesn't store individual field values, so it would be hard to locate labels.  Dave described a project where he indexes jsonld in elastic search such that fields are available for search.  From that, Jeremy proposed that airflow could be a solution for processing activity streams and updating labels.  All of this feels like future planning since this work is not currently on the Sinopia work plan and very few authorities have activity streams at this time. (notes)
      • Met with David Newberry from Getty to discuss their activity streams.  We were looking at how they compared to LOC and how Dave E. can process them for the cache.  The document you land on for a given activity is very large with lots of external references.  This is going to be harder to process for updating the cache.  (notes)
      • Wrote up the process for consuming LOC in the the Working on updating the Consumer Processing section in the EMM Change Document API based on the results of these meetings.
      • There is still a question about date handling and what we want to recommend.  Options are endTime, startTime, published, updated
      • discussion with Kevin at LOC.
      • Next regular full group meeting is Monday.
  • Containerization 
    • 2022-04-01
      • Containerizing the QaServer
        • -int and -stg images are built with the env variables that support customization. (LD4P/qa_server_container)
        • created cul-it/qa_server_aws_deploy repo to hold our customizations.  It is currently just a fork of the LD4P/qa_server_aws_deploy repo.  The plan is to put the set of authority files we support in this repo, and the footer customizations.  Explored a github action that will copy these customizations to our S3 which should be picked up automatically.  This provides a way for us to update authorities over time without having to manually copy them to S3.
        • Lifecycle questions still working on...
          • how to update out local deploy when the image changes?  This is proving tricky.  There is a way to have the actions that build the images announce the change using repository_dispatch, but so far, it looks like the only place that can process that is the LD4P/qa_server_container repo using a webhook.  What we want is all the downstream repos to be able to receive a message that they can process in a github action to update their deploys.
      • Containerizing the Cache Indices
    Containerization
    • 2022-03-25
      • Worked on QaServerContainer support for customizing Qa and QaServer (PR #56).  Used environment variables to set values in their respective initializers.  Will need to copy translations from S3 which hold some app specific text overrides for the footer.  Would be nice to have these as environment variables too, but translations can't use environment variables.  Greg and I need to work through several things. 
        • how to add the new env variables to the deploy
        • how to update the deploy when the image changes (the templates were used for the initial deploy
        • how will -int, -stg, and -prod be supported?  currently only -prod is supported.
        • Greg and Lynette will work together to see what it takes to move the container version into production
      • Dave, Greg, and Lynette met for status update and next steps for containerizing the cache.  Talked through current workflow for QaServer.  Greg will take first stab at setting up the cache search API endpoints in a Tomcat container for 2 authorities.Greg has hosted indexes in S3 for Dave so he can point cache server at them. Dave hasn't had time to work on it. Now Greg trying to do this and has war files from Dave

Other Topics.

  • Sinolio - Sinopia-FOLIO
    • 2021-12-17 - Work Cycle finished, sprint video out
  • OCLC Linked Data / Entities Advisory Group
    • 2021-12-10 OCLC presented at bigheads meeting this week, in testing
  • PCC 
    • 2021-01-21 Definitions and non-RDA final report to POCO (hopefully) to be submitted next week
    • 2022-01-14 Nothing new to report.
  • Authorities in FOLIO
    • 2022-03-25 Some transitions in team. Useful meeting with Jenn, Frances, Nick, and Darcy to decide what needs to be provided to build queue. Mockups look good and allow filtering on types of change (new, deleted, updated). Quite different indexing requirements for data maintenance vs discovery

...