2021-07-02 Cornell LD4P3 Meeting notes

Date: 02 Jul 2021

Attendees: Jason, Lynette, Tim (briefly), Simeon

Regrets: Huda, Steven

Linked-Data Authority Support (WP2)

Qa Sinopia Collaboration – Support and evolve QA+cache instance for use with Sinopia
- 2021-07-02
  - No meeting with Stanford this week. Met with ShareVDE to talk about datastores and APIs. See Q&A doc for more info and followup questions. Primary take aways...
    - 2 tenants:
      - PCC
      - Everything else – could perhaps use provenance to pull out info for one institution
    - 3 datastores:
      - Postgres DB
        primary datastore into which data is ingested synchronously
        data includes CKB and (I think) institutional bibliographic data as (I think) BibFrame
      - Solr
        derived from Postgres DB asynchronously
        only enough data to fulfill full text searching (I'm pretty sure)
      - Stardog - derived from Postgres DB asynchronously
        data includes CKB and institutional bibliographic data as BibFrame (I'm pretty sure)
    - 3 APIs
      - GraphQL
        search Postgres for keywords using boolean search; search solr for full text search
        returned data is JSON with shape based on query
      - REST API
        same datastores searched as GraphQL
        returned data is JSON with predetermined shape based on entity
      - SPARQL endpoints (one for PCC, one for everything else)
        search Stardog
        returned data is RDF with shape determined by SPARQL query
    - Data planned to be released in 3 stages with Stage A including Stanford & LOC
    - Next steps: clarifications with SVDE team, discuss with Sinopia-QA group, develop plan
Best Practices for Authoritative Data working group (focus on Change Management)
- 2021-07-02
  - No meeting this week.
Cache Containerization Plan - Develop a sustainable solution that others can deploy
- Consider moving live QA instance from EBS to container version? Need to consider update mechanisms CI/CD. Agree that this is a good direction and Greg/Lynette will discuss
- 2021-06-25
  - Greg made some progress on containerization. Has a Jenkins job that can pull from github and deploy to the running service. This is key to replacing the ElasticBeanstalk version. Can now design the update process and then think about replace the beanstalk version
  - Lynette will be working on this next week: firm up instructions for QA servers and getting containers for Dave's work
  - Also need to think about LD4 conference presentation, week of July 19
- 2021-07-02
  - Lynette has successfully manually deployed to dockerhub and AWS ECR. Now working on automated build from github commit using github actions

Discovery (WP3)

https://github.com/LD4P/discovery/projects/2 for issues etc.
Draft of a discovery plan: https://docs.google.com/document/d/1zKYW7FQVVNvyd0XjjW0qWznX9PC3jbmOE6Kz_yygPjs/edit?usp=sharing
Research: how to go from knowledge graph to an index
- Research decision points, Use cases
DASH! (Displaying Authorities Seamlessly Here)
- Dashboard design meeting kickoff notes
- User reps D&A meeting: Expect next follow-up in August (Slides: from user reps meeting 2021-04-09 and result was "not no")
- https://docs.google.com/document/d/1PgQi3xobsPhr9DUHU_YGeimL1OjNiiTdkiNWb36r3Gg/edit
- Usability testing and followup for DASH: Usability results
  - Usability results, a few little things to finish up
  - GitHub issues
  - How long to continue working on DASH! ?
    - 2021-06-11 Tim and Huda had a meeting to discuss priorities and approach. Main concerns were showing user reps advancement in features and . Tim working on mockups on redesign; will not implement – but will present to user reps to decide which options. Not prioritizing anything functional at-present. Will make sure pages work well when there is not enough data on the page. Will put prototype in position that we feel more comfortable when people play with it. Concerned that if show same thing at user reps, will be unproductive... but question whether anyone will remember. With more robust prototype, hope will yield more of a decision. Tim does not have much time to work on this (2.5-3 weeks) – aiming for user rep meeting in August.
  - 2021-06-25
    - Huda hasn't really worked much on DASH but plans on devoting 1.5 -2 weeks during July. (This should leave the rest for BANG!)
    - FOLIO is somewhat consuming right now
  - 2021-07-02
    - Huda and Tim discussed work to do on DASH before user reps meeting
- Video for DASH!, theme?
  - Sonic? Roadrunner?
  - Youtube Creative Commons License filter (hope maybe?)
BANG! (Bibliographic Aspects Newly GUI'd)
- Jamboard link
- Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
- Full OCLC concordance us 343M rows, and gzipped the file is 3.3GB
- SVDE Works

- - 2021-02-26 Have to develop SPARQL queries to pull out certain sorts of connected Work. Don't expect data to be very dense but do expect that we would get useful connections between print and electronic for example. We already have a link based on the OCLC concordance file from several years ago.
  - ACTION - Steven Folsom and Huda Khan to work on building an equivalent of the OCLC concordance file based on SVDE data and then do a comparison to see how they are similar and different
    - 2021-04-02 Steven and Huda met to think about putting together queries to extract a similar dataset. (Document for recording queries). Open questions about the counts – got 16k works from one view, got about 8k where limited to case with at least one instance. These numbers are much much lower than expected
    - 2021-04-16 Steven working with Dave on how to pull our SVDE data. Dave still working through some errors in ingest of SVDE data – this needs to be resolved before looking for concordance. Has asked Frances for 2015 concordance
    - 2021-04-23 Waiting on indexing of PCC data, have learnt more about the basis for the old OCLC concordance file
    - 2021-05-07 Steven didn't have much luck getting data from SVDE, learning GraphQL endpoint but also problems with timeouts there (HTTP 503)
    - 2021-06-11: At impasse. new modeling is represented in GraphQL data but fuller data are in RDF. Need to talk to SVDE when have QA/Sinopia conversation. Asked for test data but unsure when we'll have it all. Could consider doing this via Stanford Institutional data - though not ideal. ACTION: Steven will ping Anna to inquire on existing thread
- What is the space of Work ids that we might use and their affordances?
  - OCLC Work ids, SVDE Opus (Work), LC Hubs (more than Hubs), what else?
  - Connections to instances, how to query, number
- Other SVDE entities
  - 2021-05-07 ACTION - Huda will reach out to Jim Hahn about entities other than Works represented in SVDE - DONE
  - Summarized here: Jamboard link - U Penn Enriched Marc: Work Ids in 996 Field. 1.2 million with OCLC Work IDs in > 1 description. ~3.9 million with OCLC Work IDs in only one record.
- Publisher authorities/ids
  - At Cornell we haven't tried to connect authorities with publishes
  - LC working on connecting to publisher identifiers - utility is things also published by a publisher
  - Also possible interest in series and awards
  - 2021-04-23 Might be able to use LC publisher ids in BANG!, Steven will look at whether there is a dump available
- 2021-06-04
  - To plan BANG! we need to think about what can be done with the available data. Perhaps take some concrete examples to consider what LC and SVDE data might give us, no longer sure what we could do with current OCLC works data (hope that entity work will provide new data later)
  - What about providing users with better access using alternative labels etc. that might better match their expectations, including different languages via VIAF connections. Much of our catalog data around languages is very bad because we use roman transliterations based on LC rules that are not well sync'd with actual practices in other locales.
  - Other possible datasets? Wikidata information is quite sparse (see jamboard). We get Syndetics ToC data for the catalog now, are there other structured data sources for ToC? Perhaps also look at wikicite – could suggest articles even if we don't generally have article level data. ACTION - Huda to ask Jesse whether there are any open structured datasets for ToC, even if much smaller.
- 2021-06-11
  - Huda asked Jesse about open structured datasets.
  - Huda reached out to Filip Jakobsen from Samhaeng; asked whether anything we can learn about use cases around people wanting to search across institutions to see what works exist (in ReSHARE capacity); Filip made two points: people do not benefit from looking at separate pages for Works and Instances (e.g.: conceptual distinction is not useful for users); users do not want multiple pages per institutions that has that work. If 35 instances that are same across institutions, they don't care for them to be separated. Context here is ILL – and wonder whether that would be true in local library's catalog. Filip had diagram that showed mapping b/t hubs and opi (opuses). ACTION: look at what works are and how would we map concrete examples... can you walk thru end-to-end representation of information for a few concrete examples.
- 2021-06-25
  - Huda will ask Jesse again (but after or on July 1st) about other open structured datasets for table of contents information.
  - Filip forwarded link to ReSHARE use cases/UX work: https://projects.samhaeng.com/1006/d/1001/ (documents links at various sections)
  - Steven will look at what works data we have in the last SVDE converted dataset in DAVE for Cornell
- 2021-07-02
  - No update
DAG Calls
- 2021-06-25 Discussion on outputs from DAG calls, hope to get KP white paper completed over the summer, and then the "lord of the rings" (to bind them all) spreadsheet

Developing Cornell's functional requirements in order to move toward linked data

C.f. Stanford functional requirements document: https://docs.google.com/document/d/18H6zYGwKuCg3SZqm9Q_cxkZThcdmBjknE6HdtQ-RRzk/edit#heading=h.4fu64x8jzm6e
What does success look like? And then how do we get there?
Miro board (diagramming): https://miro.com/app/board/o9J_lfXUUj8=/
Notes space: https://docs.google.com/document/d/1TVPBFak7DkfjBptKl-pCMWQnOaiWHB0XCHswiB3Fr9g/edit?usp=sharing
Purpose: Vision for mid-term (3-5 years) transition to support linked-data at Cornell. May include things we don't yet have or cannot yet do, but not long-term vision of post-MARC environment
Important to understand sources of truth (primary data) and where there is derivative data
Imagine landscape with items described in multiple formats including at least MARC, BF, DC (eCommons), JSTOR
Imagine all items indexed and discoverable via D&A
2021-06-25 At the moment we are working toward the notion that we need BF editing in Sinopia with FOLIO so that is a focus. Perhaps pick this up again to explore more later

Upcoming meetings

https://kula.uvic.ca/index.php/kula/announcement/view/1 . Call for Proposals - Special Issue: "The Metadata Issue: Metadata as Knowledge". Due January 31, 2021 (abstract 300-500 words). Includes "The use of linked open data to facilitate the interaction between metadata and bodies of knowledge" and "Cultural heritage organization (libraries, archives, galleries, and museums) and academic projects that contribute to or leverage open knowledge platforms such as Wikidata"
- Folder Link, CFP + Brainstorming
LD4 Conference 2021 - conference is July 12-23 https://ld42021.sched.com/
- July 20
  - 11-12:30 session, second talk https://ld42021.sched.com/event/jo9t/lux-illuminating-the-collections-of-yales-museums-libraries-and-archives-via-linked-open-usable-data-from-prototypes-to-production-the-continuing-story-of-discovery-in-the-linked-data-for-production-closing-the-loop-grant?iframe=no
    - From prototypes to production: the continuing story of discovery in the Linked Data For Production: Closing the Loop grant
  - 1-2 session https://ld42021.sched.com/event/jo9z/the-journey-of-an-entity-from-marc-or-bibframe-to-a-discovery-interface-a-discussion-of-opportunities-for-and-effects-of-the-use-of-linked-data?iframe=no
    - The journey of an entity from MARC or BIBFRAME to a discovery interface: A discussion of opportunities for and effects of the use of linked data
- July 22
  - 10:30-12 session https://ld42021.sched.com/event/joAl/shared-entity-management-infrastructure-sustainability-and-load-distribution-of-authoritative-data-lookup-services?iframe=no
    - Sustainability and Load Distribution of Authoritative Data Lookup Services (Lynette)
- July 23
  - 10-11 session https://ld42021.sched.com/event/jo9w/lightning-talks-session-2?iframe=no
    - Authoritative Data: User Stories and Change Management (Lynette)
  - competing 10-11 session https://ld42021.sched.com/event/joBU/ld4-meeting-discovery-affinity-group?iframe=no
    - DAG (Huda)
BIBFRAME in Europe workshop - September 21-23 15:00–18:00 CEST = 9am-12noon EST
- Call is out https://www.casalini.it/bfwe2021/, Jason submitted on Sinopia-FOLIO integration and ARM
SWIB virtual again this year, call for proposals out, due July 12
- Conftool says deadline 13th July 2021, 12:59:59pm CEST (corresponds to 13th July 2021, 06:59:59am EDT)

Next Meeting(s), anyone out?:

2021-07-09

Page tree

Linked-Data Authority Support (WP2)

Discovery (WP3)

Developing Cornell's functional requirements in order to move toward linked data

Other Topics

Upcoming meetings

Next Meeting(s), anyone out?: