Discogs data use - in production since January 2021
ACTION (sometime after March D&A sprint) - Tim to follow up on implementation and look at data from tracking use of the Discogs. There is an issue to start collection of data after the sprint by integrating the data collection in the sprint work, will need to wait long enough to have data to analyse
This should be moved off the ACTION list since it will be about fall before this is addressed
ACTION (during March D&A sprint): Tim Worrall will raise usability testing for D&A queue (don't carry forward in notes as now outside of LD4P)
3/12: Will be raised in sprint next week
Strand 2: research: how to go from knowledge graph to an index
First goal: DASH! dashboard (full page for entity) that extends on the idea of an embedded knowledge panel, aim to have functional prototype for end of year
DASH! (Displaying Authorities Seamlessly Here)
Dashboard design meeting kickoff notes - will also try to understand what our data will support or connections to other data sources
2021-01-29: Huda working to get scripts in place to populate index; bringing in period-O info; focused on locations with Wikidata URIs for consistency. Subject headings: script that takes-in components & breaks those out... and parses into timeline info. On Dave's fuseki, 34 distinct temporal terms with labels. Will finish today with actual index. Will break-down the loading to increase load speed
Reached out to IRB to ask about testing: if we want to disseminate results as research data, need to do IRB protocol; has a follow-up. Waiting to hear back but will submit protocol if no word. Simeon's interpretation of reply is that we are crossing line into research and the approval will likely be positive. Depending on how we describe what we're doing it either falls under research OR improving a product... but we're essentially doing research to improve a product so yes to IRB review.
ACTION ITEM: IRB did respond and say they wish us to proceed with sending in an application. Huda will work on this and reach out with any questions if needed.
2021-02-26 After some discussion and clarification there remain to submit details of consent forms, addition of potential interview and focus group questions, de-identification data, and compensation information
2021-03-05 Huda sent more replies to IRB
2021-03-12: Has been IRB approved (i.e. granted exemption)
2021-02-19 Tim has been working on entity page. Notes a number of issues with the Historopedia timeline such as items with same date being hidden, but performance is good
2021-03-05 Tim resolved a number of issues. Next week will return to work on this and deal with influence-for and influenced-by presentation
2021-02-05
What would D&A user reps favor?
Concern that full KP linked from button is too much
Is "KP-lite" on autosuggest a good route? We think users would find this valuable. Are there options that minimize index changes?
What warrants a KP?
What is the redundancy between KP work and DASH!? Does dashboard mean a fundamental change or is just an enhanced KP?
We need to be aware of which options require significant indexing changes. There is already a sense that we want to add ids to the index
What about the open syllabus project? This relied on the open syllabus API, not sure whether it is available in LD. Essentially a mapping from domain→CSIP codes→ ISBN, very few wikidata connections
What would be the smoothest next step for production?
Which option would give us real linked data connections via URI?
Steven notes that LTS authorities in FOLIO group is looking at the insertion of URIs into MARC records (resources willing)
ACTION - Huda KhanTim Worrall document options and implications as preparation for user reps presentation in order to get a steer on where to continue experimentation with a view to future implementation
2021-02-12 Agreement that streamlined KP is a good starting point, with possibility of later extension to a full dashboard. Autosuggest and open syllabus good alternative options.
2021-03-12: Considering KPAOW zero (streamlined knowledge panels). Have begun discussing what should go in a streamlined version.
ACTION - Huda Khan to line up meeting with D&A user reps
2021-03-05 Understanding that we aren't going to be asking for review of anything to be deployed before the FOLIO go-live. User reps are happy to provide us with guidance for ongoing development
2021-03-12: Huda emailed Lenora and others regarding setting up a User reps meeting, but will follow up again.
Tim on ESMIS for next weeks, Huda working on IRB and also looking at dashboard with new version of historopedia which is much faster. Huda also looking at avenues for recruitment, have found out about student worker lists for Olin and for Mann, and grad carrel users list
Planning for discovery work
Work so far has focused on authorities and what we can do in catalog
How might we use BF modeling and data from SVDE? At DOG meeting on Monday there was discussion, also similar discussions in DAG about specific use of modeling
Next steps
Usability testing for DASH
Confirm gift card process with financial office
Recruitment emails to be sent out next week
Should be able to tie up any development on author/subject by end of March, so considering first two weeks of April for scheduling tests
User reps D&A meeting: Need to re-follow up
Good to have knowledge panel lite mockups or examples ready to show
There will be a meeting with ShareVDE + LD4P teams on Monday. Previous meeting of this group were focused on how to get Sinopia data into ShareVDE. The upcoming meetings will focus on how to get ShareVDE data into Sinopia. There are two tasks: 1) find a ShareVDE resource of interest, 2) clone a ShareVDE resource into a Sinopia resource. QA seems like a likely candidate for the search. There are several options for cloning which might include direct de-reference through ShareVDE, de-reference directly from Dave's cache, or de-reference through QA.
There were 53 responses. If you only take into account the first choice, Change Management is the top choice. If you take into account the top 2 choices, Linked Data Tooling takes the lead. With the top 3, Language Processing just barely moves to the top. With all 4 choices taken into account, all but moving to specifics from user stories are roughly equal. I'm inclined to take on Change Management because it it the most straightforward and would be a quick win. Then move to Linked Data Tooling for a 3rd charter.
2021-02-19 Greg completed CloudFormation template that allows someone to spin up a QA service in AWS easily. About 500 lines of template code that brings this very close to being a turnkey solution (in services-ci branch).Greg notes pre-reqs for spinning this up: S3 bucket for configs etc. which could be added to another template.
When complete Lynette will test, then ask Dave to test, then ask Stanford folks. Greg will also create a demo screencast.
What about replacing the current QA setup with this new approach? Would need to check authority configuration and correct setup for load. Lynette notes need to copy over the DB to retain history
Next steps
start to look at containerize Dave's setup. Two steps: 1) code to serve from cache, 2) indexing process
think about instructions for a vanilla linux server setup
2021-02-26
Cache containerization discussion in QA-Sinopia meeting: We mostly talked about the next steps for the cache creating two containers: 1) container for API requests to retrieve cached data, 2) container to ingest data downloads and creation of the Lucene index. This is fairly straight forward in the current approach of a full-data dump and ingest. It is expected that there will be some complexities to resolve in how to update indices when change management techniques are deployed by authority providers that allow for incremental updates. We punted that discussion until later when the format of change management streams is defined. Stanford was asked their preferred deploy platform and they indicated that AWS was preferred.
Greg will work with Dave when he starts work on containers and tester and sounding board
CloudFormation - Greg has written templates and Lynette is going to test these out (will document time taken). Hope to find anything missing in template or documentation, perhaps some permissions issues will be revealed too that will allow documentation of critical permissions
Next Greg will look at prerequisites that need to be set up and work to template these in a helper template
2021-03-05
Completed prerequisites template which includes S3 bucket and EFS filesystem - next step is to document instructions and how then to move to next template
Greg/Lynette to coordinate Lynette's testing next week - use feedback to refine documentation
Then create demo screencast
2021-03-16
Working on writing documentation
Need to discuss approach with Dave Eichmann. Good to test run the containerization process. Greg and/or Lynette will follow up with Dave.
Lynette can try out the lookup container next week
Developing Cornell's functional requirements in order to move toward linked data
Purpose? Vision for mid-term (3-5 years) transition to support linked-data at Cornell. May include things we don't yet have or cannot yet do, but not long-term vision of post-MARC environment
Important to understand sources of truth (primary data) and where there is derivative data
Imagine landscape with items described in multiple formats including at least MARC, BF, DC (eCommons), JSTOR
Imagine all items indexed and discoverable via D&A
Functions of "Aggregated index, allowing pivoting & ETL"
Includes current functionality of Frances' indexing
Does it include any editing?
Is there interaction with CULAR?
Includes indexing associated with DCP
What interfaces or functionality do we expect for the connecting lines?
Do we need a diagram for now (or at least July 1, 2021 with Voyager gone)?
2021-03-05 Jason plans to update diagram and create narrative around it, hope to discuss next week
Other Topics
OCLC Linked Data / Entities Advisory Group
2021-03-05 See comments above
PCC - Sinopia collaboration
2021-02-05 Charge to form a new group for documentation, mentoring etc is under reviews
PCC Task Group on Non-RDA Entities
2021-01-15 PCC reviewed proposal but no decisions made yet, looking at description wrt cataloger use, discussion will continue
ACTION - Huda Khan to look at changing to `main` for LD4P/discovery (and update the Blacklight Cornell fork for LD4P3 to bring in the latest)
SVDE Workshop - several attended
Impressed by clear presentation of models and active APIs (REST and GraphQL)
Expecting models to be fully implemented this summer
At some time might want to add module to QA to query against GraphQL
Authorities in FOLIO
Hope to include URIs as part of Cornell FOLIO migration, possible LD4P work
Upcoming meetings
https://kula.uvic.ca/index.php/kula/announcement/view/1. Call for Proposals - Special Issue: "The Metadata Issue: Metadata as Knowledge". Due January 31, 2021 (abstract 300-500 words). Includes "The use of linked open data to facilitate the interaction between metadata and bodies of knowledge" and "Cultural heritage organization (libraries, archives, galleries, and museums) and academic projects that contribute to or leverage open knowledge platforms such as Wikidata"