Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Date: 21

Attendees: Greg, Tim, Lynette, Huda, Simeon, Jason, Steven

Regrets: Greg  

Meeting time after July 9 (or some after change)

  • Agree keep to 9:30am Friday

Discovery (WP3)

  • https://github.com/LD4P/discovery/projects/2 for issues etc. 
  • Draft of a discovery plan: https://docs.google.com/document/d/1zKYW7FQVVNvyd0XjjW0qWznX9PC3jbmOE6Kz_yygPjs/edit?usp=sharing
  • Research: how to go from knowledge graph to an index
  • DASH! dashboard (full page for entity) that extends on the idea of an embedded knowledge panel, aim to have functional prototype for end of yearDASH(Displaying Authorities Seamlessly Here)
      ! (Displaying Authorities Seamlessly Here)
      • Dashboard design meeting kickoff notes - will also try to understand what our data will support or connections to other data sources
      • User reps D&A meeting: Expect next follow-up in August (Slides: from user reps meeting 2021-04-09 and result was "not no")
      • https://docs.google.com/document/d/https://docs.google.com/document/d/1PgQi3xobsPhr9DUHU_YGeimL1OjNiiTdkiNWb36r3Gg/edit
      • Usability testing and followup for DASH: Usability results
      • User reps D&A meeting: Expect next follow-up in August
        • Slides: from user reps meeting 2021-04-09 and result was "not no"
          • Positive feedback on linking back to catalog results
          • Questions about links and display from wikidata, other sources
          • End was to discuss more in August with possible discussion of how to move into production
          • User reps have slides and wanted demo link
          • Could have lite KP might to start, possible later entity pages – or perhaps both together
        • 2021-04-23 How much more work to do before August? Candidates:
          • Follow up on results from usability tests
          • Work on how smoothly the prototype works
          • Think about more consistent look between pages - Tim thinking of making some mockups of redesign
        • 2021-04-30
          • Working on a list of tasks for the final refinements stage. Includes aligning look & feel of the entity pages, Tim is looking into this. Current subject pages optimized for things with date ranges, what do we do for subjects without date ranges (e.g. microbiology)?
      • Video for DASH!, theme?
        • Sonic? Roadrunner?
    • BANG! (Bibliographic Aspects Newly GUI'd)
      • Jamboard link
      • Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
      • Full OCLC concordance us 343M rows, and gzipped the file is 3.3GB
      • SVDE Works
        • 2021-02-26 Have to develop SPARQL queries to pull out certain sorts of connected Work. Don't expect data to be very dense but do expect that we would get useful connections between print and electronic for example. We already have a link based on the OCLC concordance file from several years ago.
        • ACTION - Steven Folsom and Huda Khan to work on building an equivalent of the OCLC concordance file based on SVDE data and then do a comparison to see how they are similar and different
          • 2021-04-02 Steven and Huda met to think about putting together queries to extract a similar dataset.  (Document for recording queries). Open questions about the counts – got 16k works from one view, got about 8k where limited to case with at least one instance. These numbers are much much lower than expected
          • 2021-04-16 Steven working with Dave on how to pull our SVDE data. Dave still working through some errors in ingest of SVDE data – this needs to be resolved before looking for concordance. Has asked Frances for 2015 concordance
          • 2021-04-23 Waiting on indexing of PCC data, have learnt more about the basis for the old OCLC concordance file
          • 2021-05-07 Steven didn't have much luck getting data from SVDE, learning GraphQL endpoint but also problems with timeouts there (HTTP 503)
      • What is the space of Work ids that we might use and their affordances?
        • OCLC Work ids, SVDE Opus (Work), LC Hubs (more than Hubs), what else?
        • Connections to instances, how to query, number
        • 2021-05-07 ACTION - Huda to start analysis
      • Other SVDE entities
        • 2021-05-07 ACTION - Huda will reach out to Jim Hahn about entities other than Works represented in SVDE
      • Publisher authorities/ids
        • At Cornell we haven't tried to connect authorities with publishes
        • LC working on connecting to publisher identifiers - utility is things also published by a publisher
        • Also possible interest in series and awards
        • 2021-04-23 Might be able to use LC publisher ids in BANG!, Steven will look at whether there is a dump available
      • 2021-05-21 - Continuing to work on all the above to find out what works data will allow us to develop in BANG! something more than we already have from the concordance data
        • ACTION - Steven to look into OCLC work data. Perhaps ask rep for ways to access/query
      • DAG Calls
        • 2021-05-07 Next week talking about visualization but then planning out sequence of topics: Framework for thinking about discovery questions, possibilities within data, etc.
        • 2021-05-21 Next Tuesday looking at a number of high level topics to decide agenda for next few months

    Linked-Data Authority Support (WP2)

        • 2021-05-28
          • Tim looking at how to handle pages for entities with little or no information, required significant page reworking
          • Tim also looking at definitive list for timeline
          • From slack
            • I haven't worked on Bang! this week since focused more on Dash!. (a) merged in latest from D&A catalog code  as of earlier this week into dev branch of our fork.  Will merge into dash work branch later (b) Started looking at knowledge panel not showing view full record link/styling.  Panel gets the full record link from Ajax request to catalog subject browse (d&a version) and then replaces with link. Display logic for D&A browse does not include full record link so it isn't shown in our version.  Relates to broader approach for subject knowledge panel i.e. using browse page to get content.  Worked fine for demo approach but will rework to either use more discrete queries or include more control over content. (c) started discussion with Tim on other usability related issues.  Resolution of some issues, such as clarifying role of data sources like repositories and digital collections, also relates to making design of author and subject pages consistent.  Tim took on mockup creation to address possible designs.  One main difference between current author and subject dash pages is that author page only shows result numbers for various sources while subject page shows first set of results in tabs for sources.  Mockups would look at option for subject page more aligned with author page design as well as a mockup in the other direction i.e. showing results in the page for authors as well.  Larger philosophical discussions in this area as well.  Will also be revisiting timeline and map coordination and what else we may show for related subjects on timeline. 
          • How long to continue working on DASH! ? Tim and Huda to discuss what could be done by mid-July, discuss next week
      • Video for DASH!, theme?
        • Sonic? Roadrunner?
    • BANG! (Bibliographic Aspects Newly GUI'd)
      • Jamboard link
      • Expect to include Works. Need to do something beyond what we already have live from the OCLC concordance data.
      • Full OCLC concordance us 343M rows, and gzipped the file is 3.3GB
      • SVDE Works
        • 2021-02-26 Have to develop SPARQL queries to pull out certain sorts of connected Work. Don't expect data to be very dense but do expect that we would get useful connections between print and electronic for example. We already have a link based on the OCLC concordance file from several years ago.
        • ACTION - Steven Folsom and Huda Khan to work on building an equivalent of the OCLC concordance file based on SVDE data and then do a comparison to see how they are similar and different
          • 2021-04-02 Steven and Huda met to think about putting together queries to extract a similar dataset.  (Document for recording queries). Open questions about the counts – got 16k works from one view, got about 8k where limited to case with at least one instance. These numbers are much much lower than expected
          • 2021-04-16 Steven working with Dave on how to pull our SVDE data. Dave still working through some errors in ingest of SVDE data – this needs to be resolved before looking for concordance. Has asked Frances for 2015 concordance
          • 2021-04-23 Waiting on indexing of PCC data, have learnt more about the basis for the old OCLC concordance file
          • 2021-05-07 Steven didn't have much luck getting data from SVDE, learning GraphQL endpoint but also problems with timeouts there (HTTP 503)
      • What is the space of Work ids that we might use and their affordances?
        • OCLC Work ids, SVDE Opus (Work), LC Hubs (more than Hubs), what else?
        • Connections to instances, how to query, number
        • 2021-05-07 ACTION - Huda to start analysis
      • Other SVDE entities
        • 2021-05-07 ACTION - Huda will reach out to Jim Hahn about entities other than Works represented in SVDE - DONE
        • Summarized here: Jamboard link -  U Penn Enriched Marc: Work Ids in 996 Field. 1.2 million with OCLC Work IDs in > 1 description.  ~3.9 million with OCLC Work IDs in only one record.
      • Publisher authorities/ids
        • At Cornell we haven't tried to connect authorities with publishes
        • LC working on connecting to publisher identifiers - utility is things also published by a publisher
        • Also possible interest in series and awards
        • 2021-04-23 Might be able to use LC publisher ids in BANG!, Steven will look at whether there is a dump available
      • 2021-05-21 - Continuing to work on all the above to find out what works data will allow us to develop in BANG! something more than we already have from the concordance data
        • ACTION - Steven to look into OCLC work data. Perhaps ask rep for ways to access/query - 2021-05-28 DONE: Steven found that WorldCat has removed the links to the RDF because the data was getting stale. Wasn't being used. Expectation that any work on Works would be folded into Entities project
      • DAG Calls
        • 2021-05-28 Had high level topics overview discussion. Interesting comments with philosophical discussions about the benefits of linked data, demonstrations/examples that are useful to cite, BIBFRAME 

    Linked-Data Authority Support (WP2)

    • Qa Sinopia Collaboration – Support and evolve QA+cache instance for use with Sinopia
      • 2021-05-28
        • We began discussing the document describing interaction patterns between Sinopia-QA/cache-ShareVDE.  The key take away from the discussion is that there needs to be clarity about how the PCC data and Stanford Institution data are expected to be used.  There are some key questions to be answered that will drive the technological solutions in Sinopia, QA/cache, and ShareVDE.  They include:
          • How is data initially ingested into ShareVDE?
          • Once initial ingest is complete, how is new data added (e.g. incremental ingests from original ingest source, newly created entities in Sinopia)?
          • Where is the Source of Truth (e.g. ShareVDE, Sinopia)?
          • How are updates made to entities?  The short answer is that edits are expected to happen in Sinopia.  But the details of how editing will happen is highly dependent on the answer the Source of Truth question.
          • We have to recognize what is possible now or soon, with shapes/connections/etc.. Sinopia and SVDE data shapes are too different for it to be realistic to edit SVDE data in Sinopia. This leaves us with related but only weakly connected pools of PCC data in Sinopia and SVDE – what does it then mean to search PCC data? One would need to search both Sinopia and SVDE. One can derive a new description in Sinopia based on a "starter record" from SVDE but it will not be possible to edit SVDE data in Sinopia
            • Do we need to start distinguishing between Sinopia PCC data and SVDE PCC data?
      Qa Sinopia Collaboration – Support and evolve QA+cache instance for use with Sinopia
      • 2021-05-28
        • We began discussing the document describing interaction patterns between Sinopia-QA/cache-ShareVDE.  The key take away from the discussion is that there needs to be clarity about how the PCC data and Stanford Institution data are expected to be used.  There are some key questions to be answered that will drive the technological solutions in Sinopia, QA/cache, and ShareVDE.  They include:
          • How is data initially ingested into ShareVDE?
          • Once initial ingest is complete, how is new data added (e.g. incremental ingests from original ingest source, newly created entities in Sinopia)?
          • Where is the Source of Truth (e.g. ShareVDE, Sinopia)?
          • How are updates made to entities?  The short answer is that edits are expected to happen in Sinopia.  But the details of how editing will happen is highly dependent on the answer the Source of Truth question.
    • Best Practices for Authoritative Data working group (focus on Change Management)
      • 2021-05-28
        • Second meeting this past Monday.  We continued discussing type of changes.  No new types were added.  We began the process of discussing what information is needed in the change management stream for each type. 
        • We talked some about NEW entities identifying two options.  We did not make a decision on which approach is preferred.  (NOTE: Format here is to convey data and is not necessarily the final recommended format.}
          • { 'type': 'NEW', 'URI': 'https://uri.for.new.entity' } with that, the downstream consumer can dereference the URI and use the results to add the entity to the cache
          • { 'type': 'NEW', 'URI': 'https://uri.for.new.entity', entity: { json-ld for new entity }  } with that, the downstream consumer can use the entity in the change management stream to add the entity to the cache

        • We spent a good bit of time on CHANGE LABEL which turned out to be more complex than expected.  The purpose of this type is to facilitate applications that cache labels for quick display  in the application or for indexing labels to facilitate search.  Again two options were identified.
          • { 'type': 'CHANGE_LABEL', 'URI': 'https://uri.for.new.entity', 'predicate': 'skos:primaryLabel', 'OLD_LABEL': 'old literal'@en, 'NEW_LABEL': 'new literal'@en } with OLD_LABEL being optional.  Without OLD_LABEL, this is a new label.  With it, the OLD_LABEL is being replaced with the new label.  Applications can search their caches for the OLD_LABEL triple and update it to the NEW_LABEL.
          • { 'type': 'REMOVE_LABEL', 'URI': 'https://uri.for.new.entity', 'predicate': 'skos:primaryLabel', 'LABEL': 'old literal'@en }
            { 'type': 'ADD_LABEL', 'URI': 'https://uri.for.new.entity', 'predicate': 'skos:primaryLabel', 'LABEL': 'new literal'@en }  I have a question about how the application will process the change management stream.  It will need to know that these two change management documents are related.  This is fine for a full cache, but may not help applications that are only caching the labels.
    • Cache Containerization Plan - Develop a sustainable solution that others can deploy
      • 2021-04-30
        • Have worked on permissions issues and documented how to implement in AWS
        • Greg now running out of things to do without more input from Dave. Can document existing work and develop presentation for conference
        • Consider moving live QA instance from EBS to container version? Need to consider update mechanisms CI/CD. Agree that this is a good direction and Greg/Lynette will discuss
        2021-05-07
        • Greg spent some time learning about github actions for CI/CD. Has some permissions issue with github repo. Expect to continue working on this today and hope to be able to redeploy container when content changes/Lynette will discuss
      • 2021-05-2128
        • Last week Greg and Lynette have discussed work on containerization of DAVEsolved some issues. Still working on documentation and still working on CI/CD. Have abandoned idea of using github actions so working with Jenkins instead
        • No progress on work to containerize DAVE, Dave is focused on the SVDE work

    Developing Cornell's functional requirements in order to move toward linked data

    ...