Page History

Versions Compared

Key

This line was added.
This line was removed.
Formatting was changed.

...

Draft notes in Google-Doc

VIVO - DataConnect - ORCID - UQAM Demo

Walking through context and use case
- "A professor wishes to add the reference to a scientific article, irrespective of whether he chooses ORCID or VIVO, the information he will enter in either of these platforms will be mutually updated"
Goal of using Kafka with VIVO:
- VIVO is a component in the enterprise, instead of the center
Main idea of Kafka
- An event-driven messaging system
- Allow for multi-to-multi producers and consumers
Recent sprint
- Ingest ORCID data into VIVO
Walkthrough of flow:
- Extract all ORCID_IDs associated with UQAM members
  - './orcid_get_all_records.sh'
  - Converting ORCID JSON into RDF
- Transform RDF into VIVO representation
- Send to Kafka
  - Then pass to VIVO
Demo
- 25,171 statements pushed through Kafka
- 763 users, with name, org, and competencies
Summary
- The ORCID ontology needs to be refined and clarified.
- The mapping between ORCID and VIVO also needs to be worked on
- The structure of the Kafka message has to be designed to respect the add/delete/modify record actions
- Several minor bugs need to be fixed in the scripts.
Future plans
- Building a POC VIVO → Kafka → ORCID
- Proving the architecture to operate in event-driven and real-time mode
- Getting POCs to Java
- Redesigning the mapping process, ORCID ontology structure and message structure

TIB

Using Kafka as a consumer of VIVO messages
Tasks
- Listener in VIVO to capture internal changes
- Producer to send to Kafka
VIVO Kafka-Module
- ModelChangedListener and ChangeListener
- Kafka start-up listener
- Http connection
VIVO producer
- Spring-boot service
Code will be in GitHub soon

Discussion

Interest in the architecture presented
- Allows for integration with any number of source systems
This initiative allows for outputs from VIVO
Can past initiatives be used in this context?
- ..such as ORCID-to-VIVO
- ..such as Dimensions-to-VIVO
Could this support large-scale ingest?
- +100M triples?
- Are there Kafka buffer limits, throttling
- Kafka is designed for "big data"
Next steps
- TIB: VIVO to other systems by Feb/May
- TIB: Other systems to VIVO... timeline is further out
- UQAM: Ingest timeframe in Q1 of 2021
- Next meeting in January? - Ralph to organize

Actions