Date: 

Attendees: 

Regrets: 

Agenda & Notes

Review actions from 2020-04-17 Cornell LD4P2 Meeting notes

  • E. Lynette Rayle QA performance
    • 2020-03-06: Dave in process of converting everything over. Unsure of status for any authority, including MeSH. When LC done, we'll know whether this has impact since there are considerable usage data for LC. Dave and Lynette each working on other projects at the moment.
    • 2020-03-20: Dave is tied up with dept. COVID-19 planning. Log analysis shows two classes of error: 1) 502 is Dave's cache not responding, 2) 500 error which depends on query, Java null pointer error from Dave's system, and fewer of a third 3) timeout error
    • 2020-04-03 On hold because Dave is otherwise tied up
    • 2020-04-10 Lynette has done an analysis of current performance Performance Analysis for QA Lookups – The Getty authorities are pretty much all sub-second which is not too bad, however averaging over all authorities we are around 2s. We find that for ARGOVOC and GEONAMES direct access (without extended context) but when going against the cache we find typically 3-4s with the QA portion (graph load and normalization) approaching a second – Lynette is not yet sure why Geonames has such a large normalization hit for example, why is the graph load time long for Argovoc?
    • 2020-04-17 Did an analysis sorting response time by complexity and size but it didn't show a clear picture, will try a little more analysis next week. When Ligatus and CERL were added - why is Ligatus fast and CERL slow? Dave expects to have more time soon
    • 2020-04-24 Lynette did an analysis of performance to try to understand whether speed is clearly related to data size or complexity of extended context. Result are that there isn't clear correlation. Tried to parallelize parts of QA and in some places saw slowdown, one place found speed improvement where the complexity is high. However, in the complex cases the times are often still rather long (0.5–2s) but not markedly longer than somewhat less complex queries. Still the worst cases are because of the retrieve time from Dave's cache, he is looking at why CERL is slow when we might not expect it to be. Unfortunately no clear path to improving everything from the QA side: Lynette will try to understand why the OCLCFAST graph load is so slow.
  • E. Lynette Rayle  to set up best practices working group around linked data APIs for authorities → documenting on Charter 1 - Best Practices for Authoritative Data Working Group, with target start in early May
    • 2020-04-17 Heard from Christine @ Harvard, Rob @ Getty, Kirk @ OCLC, likely ISNI, nothing from wikidata, have also asked Samvera community – have enough to move forward with a meaningful working group
    • 2020-04-24 Yes from wikidata, will start early to mid-May

Status updates and planning