Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Draft notes in Google-Doc

VIVO - DataConnect - ORCID - UQAM Demo

  1. Walking through context and use case
    • "A professor wishes to add the reference to a scientific article, irrespective of whether he chooses ORCID or VIVO, the information he will enter in either of these platforms will be mutually updated"
  2. Goal of using Kafka with VIVO:
    • VIVO is a component in the enterprise, instead of the center
  3. Main idea of Kafka
    • An event-driven messaging system
    • Allow for multi-to-multi producers and consumers
  4. Recent sprint
    • Ingest ORCID data into VIVO
  5. Walkthrough of flow:
    • Extract all ORCID_IDs associated with UQAM members
      • './orcid_get_all_records.sh'
      • Converting ORCID JSON into RDF
    • Transform RDF into VIVO representation
    • Send to Kafka
      • Then pass to VIVO
  6. Demo
    • 25,171 statements pushed through Kafka
    • 763 users, with name, org, and competencies
  7. Summary
    • The ORCID ontology needs to be refined and clarified.
    • The mapping between ORCID and VIVO also needs to be worked on
    • The structure of the Kafka message has to be designed to respect the add/delete/modify record actions
    • Several minor bugs need to be fixed in the scripts.
  8. Future plans
    • Building a POC VIVO → Kafka → ORCID
    • Proving the architecture to operate in event-driven and real-time mode
    • Getting POCs to Java
    • Redesigning the mapping process, ORCID ontology structure and message structure

TIB

  1. Using Kafka as a consumer of VIVO messages
  2. Tasks
    • Listener in VIVO to capture internal changes
    • Producer to send to Kafka
  3. VIVO Kafka-Module
    • ModelChangedListener and ChangeListener
    • Kafka start-up listener
    • Http connection
  4. VIVO producer
    • Spring-boot service
  5. Code will be in GitHub soon

Discussion

  1. Interest in the architecture presented
    • Allows for integration with any number of source systems
  2. This initiative allows for outputs from VIVO
  3. Can past initiatives be used in this context?
    • ..such as ORCID-to-VIVO
    • ..such as Dimensions-to-VIVO
  4. Could this support large-scale ingest?
    • +100M triples?
    • Are there Kafka buffer limits, throttling
    • Kafka is designed for "big data"
  5. Next steps
    • TIB: VIVO to other systems by Feb/May
    • TIB: Other systems to VIVO... timeline is further out
    • UQAM: Ingest timeframe in Q1 of 2021
    • Next meeting in January? - Ralph to organize


Actions

  •