Date: 

Attendees: everyone

Regrets: 

Agenda & Notes

Review actions from 2020-05-08 Cornell LD4P2 Meeting notes

  • E. Lynette Rayle QA performance
    • 2020-04-17 Did an analysis sorting response time by complexity and size but it didn't show a clear picture, will try a little more analysis next week. When Ligatus and CERL were added - why is Ligatus fast and CERL slow? Dave expects to have more time soon
    • 2020-04-24 Lynette did an analysis of performance to try to understand whether speed is clearly related to data size or complexity of extended context. Result are that there isn't clear correlation. Tried to parallelize parts of QA and in some places saw slowdown, one place found speed improvement where the complexity is high. However, in the complex cases the times are often still rather long (0.5–2s) but not markedly longer than somewhat less complex queries. Still the worst cases are because of the retrieve time from Dave's cache, he is looking at why CERL is slow when we might not expect it to be. Unfortunately no clear path to improving everything from the QA side: Lynette will try to understand why the OCLCFAST graph load is so slow.
    • 2020-05-08 No progress (Lynette working on exhibits about half the worrk, but has worked on accuracy), no updates from Dave
    • 2020-05-15: Dave still working on performance. Reported accuracy results earlier this week. Steven input more tests... will run these in 15 and report then.
  • E. Lynette Rayle  to set up best practices working group around linked data APIs for authorities → documenting on Charter 1 - Best Practices for Authoritative Data Working Group
    • 2020-05-08 Now starting first Monday in June and then every other week for 4 months, getting folks to do some work up-front. Think that a later WG might look at change management
    • 2020-05-15: logistics put into place. Slack channel. Pointed toward Lyrasis to sign-up. Meeting invite for all meetings. People are responding on invite

Status updates and planning

  • Enhanced Discovery - WHAM! (see also https://wiki.duraspace.org/x/sJI7Bg and https://github.com/LD4P/discovery/projects/1)
    • See: Organizing doc and Pseudonym thoughts
    • Updates also on running notes page
    • John has been working on a gem using overlapping namespaces as the Duke gem does
    • Tim is on D&A sprint this week
    • Had meeting on Monday to review use cases and scenarios with Jenn, Frances, Laura – see notes added to google doc.
    • Huda took a lot of time off this week.
    • John set up a video demo for packaging approach. Figured out how to make a gem an engine that is included in Rails app alongside Blacklight. Gem can modify Blacklight's behavior. Can do whatever we want to modify how Blacklight works. Modification we want to make is to point BL's autosuggest feature to index being built by Huda and Tim. Modify BL's assumptions of how SOLR interacts with it
      • Autocomplete and Autosuggest in BL: Duke uses modifications to out-of-the-box. Stanford looked into this but landed on using a diff request handler (ie not built-in... but a separate search mechanism).
      • Since there are multiple examples of people wanting to use diff search index, endpoint, handler – this packaging would be welcome.
    • Slides about work for Discovery Affinity Group
    • Will regroup next week. Have index with examples of all the entities... and working UI code... and approach for packaging. Have additional possibilities to try index-only approach. Identify whether additional use cases need to be met. Identify whether we should integrate knowledge panel work. 
    • Also note that had conversation with Kevin (Usability Working Group) to assess usability testing with students.  Currently, not looking at testing with students so would need to do testing with staff.
    • Have a working demo!
  • Authority Lookups for Sinopia (Lookup infrastructure: https://github.com/LD4P/qa_server/projects/2, Authority requests: https://github.com/LD4P/qa_server/projects/1)
    • 2020-05-06 discussion where Astrid shared feedback about Sinopia and QA from catalogers at https://docs.google.com/document/d/14Sh2mBqkB2i9xml-Y7Aw-BGyvSAGwIS0I40jQXz88Pw/ . We note trade-offs between cached access and direct access in control/speed/scalability
    • 2020-05-08 Lynette has done work on merging in accuracy tests that Steven had produced. Everything is in the system pending deployment. Going to put the test harness in rspec so that it can be run in the background on stage. As much as possible will try to run the same tests on direct connection and on cache so we can do comparison. Think this should be straightforward
    • 2020-05-15:
      • some things failing b/c tests are in-place before enough direction to Dave; not just Dave having to change indexing – need to instruct how to do so. Some are in index and available in QA that have not been instructed about what context we want / what we want to search.
      • ran 64 tests in last set.
        • 19 tests failed to find result at-all. severe failure
          • GeoNames interesting. Use same test for Direct and Cache and both fail. Perhaps need to rethink how we are looking for things. GeoNames keeps data separate (e.g.: New York and US are separate fields... cannot search for New York, US and get a result).
          • No failures are diacritics-related or special characters related (i.e.: hyphen and parentheses are both passing)
        • 3 failed to find result in the desired position (e.g.: not in results 1-5), not a severe failure.
        • 3 tests do not have a query... might be an issue with the input
        • Next step: look at tests to see if it is unique to authority of if the data is not there. Do we really expect these tests to work? Should be able to report on some next week. Steven will also work thru queue and ensure that this still reflects our priorities... while managing expectations around turn-around. If can get attention on queue, will also get attention on the tests.
        • When doing triage, there were 6 or so requests for authorities; two were transferred to Sinopia. A few others need to be investigated and followed-up
    • Discussion with Sinopia team around searching, following thread of a few weeks ago
      • spoke about indexing this week. Agreed that user testing needs to be done to determine best route forward. 
      • Lynette suggested: by default, could just do URI and label and, if cannot choose from that, select button for more context. OR have minimal context identified to have smaller amount of data passed around. Has UI impact for "select more context". Perhaps have extra-data option per-entity rather than per-search? User studies can determine best approach... but should not do this prior to Dave's work
  • Meetings (see LD4P2 Cornell Meeting Attendances)
    • Knowledge Graph Conference had around 500 participants; was very machine learning oriented with enterprise approaches. Looked for good recs for free back-ends. Oracle said looking into embedding SPARQL into SQL queries to use same database for RDF and relational. Clip and its replacement of Clif... Huda may inquire?
      • Whats a Knowledge Graph? Slack channel devoted to this. Basically = semantic way of putting together information to use for inferencing. People don't really know what it is but Google uses the term. Abstract level above property graph - still have relationships between entities... and want to find things. Term where the successes go whereas semantic web are where we talk about the failures.
      • Presentation yielded a few questions and well-received. 
    • Blacklight Virtual Summit: Huda and Melissa presented. Abbreviated format. Might be a follow-up in October. Demo had positive comments / appreciated work shown. 
    • LD4 Conference 2020, was to be May 13/14, 2020 but no dates set of virtual form
      • Planning going ahead for meeting to work out virtual presentations. There was a survey of presenters to understand how they might want to present. Steven likes the idea of recorded sessions followed by office hours.
      • 2020-05-08 Planning committee met, noted that presenters need a reasonable amount of time to prepare. The plan is to group sessions into panels and then reach out to speakers about scheduling. Content will be spread over several days. Leaning toward live sessions that will be recorded.
      • Steven reached out to chairs asking about workshop and wondering if they were going to support asynchronous workshops
    • rdfs:seeAlso Conferences Related to Linked Data in Libraries
  • Next meetings
    • ...