Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date: 

Attendees:  Greg, Tim, Lynette, Steven

Regrets:  Huda

Last meeting: 2021-02-26 Cornell LD4P3 Meeting notes

...

  • https://github.com/LD4P/discovery/projects/2 for issues etc. 
  • Draft of a discovery plan: https://docs.google.com/document/d/1zKYW7FQVVNvyd0XjjW0qWznX9PC3jbmOE6Kz_yygPjs/edit?usp=sharing
  • Strand 1: production piece 
    • Production requirements and functionality – Production decision points
    • Discogs data use - in production since January 2021
      • ACTION (way after March D&A sprint) - Tim to follow up on implementation and look at data from tracking use of the Discogs. There is an issue to start collection of data after the sprint by integrating the data collection in the sprint work
      • ACTION (during March D&A sprint): Tim Worrall will raise usability testing for D&A queue (don't carry forward in notes as now outside of LD4P)
  •  Strand 2: research: how to go from knowledge graph to an index
    • Research decision points, Use cases 
    • First goal: DASH! dashboard (full page for entity) that extends on the idea of an embedded knowledge panel, aim to have functional prototype for end of year
    • DASH! (Displaying Authorities Seamlessly Here)
      • Dashboard design meeting kickoff notes - will also try to understand what our data will support or connections to other data sources
      • https://docs.google.com/document/d/1PgQi3xobsPhr9DUHU_YGeimL1OjNiiTdkiNWb36r3Gg/edit
      • 2021-01-29: Huda working to get scripts in place to populate index; bringing in period-O info; focused on locations with Wikidata URIs for consistency. Subject headings: script that takes-in components & breaks those out... and parses into timeline info. On Dave's fuseki, 34 distinct temporal terms with labels. Will finish today with actual index. Will break-down the loading to increase load speed
          ACTION ITEM: adding Wikidata URIs for any subject headings and broader/narrower URIs to index (today) - DONE for dev VM index, to do next for LD4P3 Solr index (will be done still today). Can be marked done after.
          • 2021-02-12 Each LCSH now has a Wikidata URI where available, broader and narrower URIs where available, periodO extracted spatial URIs and time periods where available, a list of components for multi-part subject headings with URIs and labels for each component, separate lists for temporal components with start and end times mapped, and geographic components with LCSH URIs and equivalent Wikidata URIs extracted and added. 
            • One error (for review perhaps later): two of the LCSH temporal component labels have special characters and are not being matched
          • 2021-02-26: DONE - One script relies on SPARQL queries against Fuseki and took about 5h on dev ad data copied to production
          Reached out to IRB to ask about testing: if we want to disseminate results as research data, need to do IRB protocol; has a follow-up. Waiting to hear back but will submit protocol if no word. Simeon's interpretation of reply is that we are crossing line into research and the approval will likely be positive. Depending on how we describe what we're doing it either falls under research OR improving a product... but we're essentially doing research to improve a product so yes to IRB review.
          • ACTION ITEM: IRB did respond and say they wish us to proceed with sending in an application.  Huda will work on this and reach out with any questions if needed.
            • 2021-02-26 After some discussion and clarification there remain to submit details of consent forms, addition of potential interview and focus group questions, de-identification data, and compensation information
            • 2021-03-05 Huda sent more replies to IRB
      • 2021-02-19 Tim has been working on entity page. Notes a number of issues with the Historopedia timeline such as items with same date being hidden, but performance is good
        • 2021-03-05 Tim resolved a number of issues. Next week will return to work on this and deal with influence-for and influenced-by presentation
    • 2021-02-05
      • What would D&A user reps favor?
        • Concern that full KP linked from (info) button is too much
        • Is "KP-lite" on autosuggest a good route? We think users would find this valuable. Are there options that minimize index changes? 
        • What warrants a KP?
        • What is the redundancy between KP work and DASH!? Does dashboard mean a fundamental change or is just an enhanced KP?
        • We need to be aware of which options require significant indexing changes. There is already a sense that we want to add ids to the index
        • What about the open syllabus project? This relied on the open syllabus API, not sure whether it is available in LD. Essentially a mapping from domain→CSIP codes→ ISBN, very few wikidata connections
      • What would be the smoothest next step for production?
      • Which option would give us real linked data connections via URI?
      • Steven notes that LTS authorities in FOLIO group is looking at the insertion of URIs into MARC records (resources willing)
      • ACTION - Huda Khan Tim Worrall document options and implications as preparation for user reps presentation in order to get a steer on where to continue experimentation with a view to future implementation
        • Dashboard (perhaps for some entity types only)
        • Autosuggest with KP-lite
        • Regular facet with KP
        • Open syllabus related items
        • Brainstorming notes
        • 2021-02-12 Agreement that streamlined KP is a good starting point, with possibility of later extension to a full dashboard. Autosuggest and open syllabus good alternative options. 
      • ACTION - Huda Khan to line up meeting with D&A user reps
        • 2021-0203-26 05 Understanding that we aren't going to be asking for review of anything to be deployed before the FOLIO go-live. Not yet clear whether the user reps will have time User reps are happy to provide us with any guidance for ongoing development work before the summerdevelopment 
    • Tim on ESMIS for next weeks, Huda working on IRB and also looking at dashboard with new version of historopedia which is much faster. Huda also looking at avenues for recruitment, have found out about student worker lists for Olin and for Mann, and grad carrel users list
    • Planning for discovery work
      • Work so far has focused on authorities and what we can do in catalog
      • How might we use BF modeling and data from SVDE? At DOG meeting on Monday there was discussion, also similar discussions in DAG about specific use of modeling

...

  • Qa Sinopia Collaboration – Support and evolve QA+cache instance for use with QA
    • 2021-0203-26:05
      • Discussed the status of
      • There was a couple of updates on ShareVDE.  Dave has just about completed the indexing of the first of 6 parts.  There is an issue with URIs including things like double quotes which is not allowed.  Dave is fixing during ingest and providing feedback to ShareVDE so they can fix it on their end.  Once this is done, he will ingest the other 5 parts.  Steven is exploring the data to determine what extended context we want to extract from the data.  Once that is done, Lynette will create the QA configuration.  Sinopia wants to access the data by searching for the desired entity.  This would be done using QA.  Once selected, they want to populate an entity in Sinopia with the "full" ShareVDE record.  "Full" is in quotes, because knowing the edges of the graph that defines full can be open to interpretation.  Retrieving the full entity can be done in one of three ways: 1) through a fetch call to QA by passing in the URI, 2) direct call to the cache to fetch the graph related to a single URI, 3) direct call to ShareVDE to fetch the graph related to a single URI.  Which approach we will use is TBD.
      • We explored future topics where we are with containerization and the next steps (see discussion in below), potential topics for the next working group charter (see discussion below), and Sinopia lookup modal UI. Sinopia team is still working on prioritization for next workcycle.
      • is still working on cleaning up the data.  He has a subset that are in Fuseki for experimentation.  Action items (in order): Dave will continue to clean and index the data, Steven will compare PCC templates in Sinopia with PCC data in the cache, Steven will define the shape of data required for extended context and for a single URI dereference, Dave will create a query API based on the results of Steven's exploration, Lynette will create a QA config, Jeremy or Justin will connect the QA config to Siniopia.  At that point, it will be ready for exploring search and clone in Sinopia.
      • Dave is still working on resolving issues with the new indexing scheme.
      • Performance numbers reported in the UI are getting worse over time.  I believe this is related to timeouts.  I would like to revisit the statistic collection code and have it track timeouts separate so that the response stats do not include the timeouts.  This will add a column to count the number of timeouts as a separate analytic.
  • Search API Best Practices for Authoritative Data working group
    • 2021-0203-2605: 
      • There are 23 responses to the survey prioritizing the second charter's topics.  All 4 potential topics are almost equally judged important.  If you limit to only the first choice, linked data tooling and taking user stories to specifics recommendations are tied.  If you look at only the top 2 choices, they continue to have a slight advantage over the other topics.  If you take into account the top 3 choices, change management and language processing move to the top.  And with all 4 choices taken into consideration, they are all roughly the same.  I will send out the request for feedback one more time with the survey closing end of day Monday.
      • Some suggested topics are provenience, AI approach to selecting a term, enhanced UI for selection, and cataloging efficiency.
      • 5 respondents provided contact information
      • The potential topics for the second working group include:  change management, language processing, linked data approaches, and moving user stories to specific recommendations.
      • The announcement of the ending of the first charter and links to the cataloger user stories prioritization survey and categorized user story summary documents went out last Monday.  There were also links to a survey for general feedback and a second for prioritizing the next charter's topic.  Each has a 2 week window for completion.  The feedback survey has 1 response that discusses the importance of extended context.  The topic survey has 4 responses.  Currently, language processing and moving user stories to specific recommendations are tied for first.  Change management is a close second.  But with only 4 responses, it is too early to tell.  I will send out a reminder on Monday to the same communities as the announcement to try and get more responses.
  • Cache Containerization Plan - Develop a sustainable solution that others can deploy
    • 2021-02-19 Greg completed CloudFormation template that allows someone to spin up a QA service in AWS easily. About 500 lines of template code that brings this very close to being a turnkey solution (in services-ci branch).Greg notes pre-reqs for spinning this up: S3 bucket for configs etc. which could be added to another template.
      • When complete Lynette will test, then ask Dave to test, then ask Stanford folks. Greg will also create a demo screencast.
      • What about replacing the current QA setup with this new approach? Would need to check authority configuration and correct setup for load. Lynette notes need to copy over the DB to retain history
      • Next steps
        • start to look at containerize Dave's setup. Two steps: 1) code to serve from cache, 2) indexing process
        • think about instructions for a vanilla linux server setup
    • 2021-02-26
      • Cache containerization discussion in QA-Sinopia meeting: We mostly talked about the next steps for the cache creating two containers: 1) container for API requests to retrieve cached data, 2) container to ingest data downloads and creation of the Lucene index.  This is fairly straight forward in the current approach of a full-data dump and ingest.  It is expected that there will be some complexities to resolve in how to update indices when change management techniques are deployed by authority providers that allow for incremental updates.  We punted that discussion until later when the format of change management streams is defined.  Stanford was asked their preferred deploy platform and they indicated that AWS was preferred.  
      • Greg will work with Dave when he starts work on containers and tester and sounding board
      • CloudFormation - Greg has written templates and Lynette is going to test these out (will document time taken). Hope to find anything missing in template or documentation, perhaps some permissions issues will be revealed too that will allow documentation of critical permissions
      • Next Greg will look at prerequisites that need to be set up and work to template these in a helper template

...

Next Meeting(s), anyone out?:

  • 2021-03-05 12 ...