2020-01-10 Chicago report will be discussed in DAG meeting on Jan 21 (expect to happen as scheduled). Follow up with David and Astrid to look at what we could apply from what they learned and what additional user studies they could help us with
DAG presentation did occur and it was very interesting (link to meeting) – in depth interviews to understand research needs with open-ended questions, need more thought to understand how we might apply lessons from this
2020-02-14 Chicago uses vufind which already has a notion of similar items, questions of what types of semantic similarity might be interesting. Also discussion of author browse and articles; presentation of introductory materials for adjacent topics – they are going to discuss internally what to investigate further. Notes from U Chicago
2020-02-14 Dave has made progress this week, is moving to have all data in index tailored to search in order to avoid SPARQL queries at search time, results are in a blob of RDF. Working on CERL first, expect to get this our soon. Will then try MeSH and OCLC FAST, then LC.
2020-02-21 CERL was deployed with the new index strategy but no before and after to compare. However, this is small so we need to wait for LC or such to get a sense of possible improvement
Adam Smith to investigate cost and any issues with setting up a D&A Beta system to allow broader testing of some discovery ideas from this work
John Skiles Skinner to continue discussion with Hathi trust about an API or access to their index
2020-01-31 There is investigation but not sure whether it will result in something we can use
2020-02-14 Some more discussions with Hathi and suggestions was to use current search with debug facility that includes things like facet values in machine readable form (requires either 1) a user account for testing, or 2) to use IP access for our dev machine but there is some issue of fixed external IP for our dev VMs)
2020-02-21 HathiTrust sent over the XML version for one of the queries that John had tried for zero results. This would be the same xml they may be able to open up for us by allowing the IP address of my dev vm to access the URL that would result in XML. Huda set up a controller that parses the xml and returns json with the list of subject heading strings and set of search results being returned (this is the xml version of the search results page that includes subject facet values). John said he should be able to incorporate the subjects and perhaps the results into the zero search results page.
Have a currently insurmountable issue with nested profiles. When create Work profile with nested Instance profile there isn't a URI for the Instance (it just gets hung from a bnode). Without a URI the title of the Instance doesn't get indexed. The Sinopia team are unable to fix this in the near term.
Cataloging work continues with the above limitations
Current Beanstalk deployment on AWS is broken. The Ruby bundler is now not working for unknown reasons – cannot deploy with known good code state so it seems like something in the environment
When deployment issue solved... will then put out CERL and ligatus and extended context for MeSH along with sub-authorities. Also some refactoring associated with monitoring status page (including fixing a memory leak due to a long-used hash - ruby doesn't reclaim space from deleted entries)
Not sure whether Dave has redone the index to avoid SPARQL for MeSH – if it is done then we will have a comparison
ACTION - Lynette to ask Tiziana about SHARE-VDE APIs for real-time up-to-date search and for possible engagement in linked data best practices for authoritative data working group