...
VIVO - DataConnect - ORCID - UQAM Demo
- Walking through context and use case
- "A professor wishes to add the reference to a scientific article, irrespective of whether he chooses ORCID or VIVO, the information he will enter in either of these platforms will be mutually updated"
- Goal of using Kafka with VIVO:
- VIVO is a component in the enterprise, instead of the center
- Main idea of Kafka
- An event-driven messaging system
- Allow for multi-to-multi producers and consumers
- Recent sprint
- Ingest ORCID data into VIVO
- Walkthrough of flow:
- Extract all ORCID_IDs associated with UQAM members
- './orcid_get_all_records.sh'
- Converting ORCID JSON into RDF
- Transform RDF into VIVO representation
- Send to Kafka
- Then pass to VIVO
- Extract all ORCID_IDs associated with UQAM members
- Demo
- 25,171 statements pushed through Kafka
- 763 users, with name, org, and competencies
- Summary
- The ORCID ontology needs to be refined and clarified.
- The mapping between ORCID and VIVO also needs to be worked on
- The structure of the Kafka message has to be designed to respect the add/delete/modify record actions
- Several minor bugs need to be fixed in the scripts.
- Future plans
- Building a POC VIVO → Kafka → ORCID
- Proving the architecture to operate in event-driven and real-time mode
- Getting POCs to Java
- Redesigning the mapping process, ORCID ontology structure and message structure
TIB
- Using Kafka as a consumer of VIVO messages
- Tasks
- Listener in VIVO to capture internal changes
- Producer to send to Kafka
- VIVO Kafka-Module
- ModelChangedListener and ChangeListener
- Kafka start-up listener
- Http connection
- VIVO producer
- Spring-boot service
- Code will be in GitHub soon
Discussion
- Interest in the architecture presented
- Allows for integration with any number of source systems
- This initiative allows for outputs from VIVO
- Can past initiatives be used in this context?
- ..such as ORCID-to-VIVO
- ..such as Dimensions-to-VIVO
- Could this support large-scale ingest?
- +100M triples?
- Are there Kafka buffer limits, throttling
- Kafka is designed for "big data"
- Next steps
- TIB: VIVO to other systems by Feb/May
- TIB: Other systems to VIVO... timeline is further out
- UQAM: Ingest timeframe in Q1 of 2021
- Next meeting in January? - Ralph to organize