Date

07 Jul 2020

Call-in Information

Time: 11:00 am, Eastern Time (New York, GMT-04:00)

To join the online meeting:

Go to: https://lyrasis.zoom.us/my/vivo1
One tap mobile:
- US: +16699006833,,9358074182# or +19292056099,,9358074182#
Or Telephone:
- US: +1 669 900 6833 or +1 929 205 6099 or 877 853 5257
- Meeting ID: 935 807 4182
International numbers available: https://zoom.us/u/aeANHanzED

Slack

https://vivo-project.slack.com
- Self-register at: http://bit.ly/vivo-slack

Attendees

Indicating note-taker

Agenda

Welcome to the VIVO Committers team: William Welling
i18n Sprint updates
1. Mini-sprint focused on VIVO committer review/refactor: July 14-16th
Moving priorities forward: Data ingest
1. What are the use cases?
2. What the "entities" to be ingested? (reference) ...let's start simple
2020-07-15 - Special Topic - VIVO Scholar Next Steps
...

Future topics

Vitro JMS messaging approaches - redux
1. Which architectural pattern should we take?
2. What should the body of the messages be?
Incremental development initiatives
1. Unable to locate Jira server for this macro. It may be due to Application Link configuration.
2. Unable to locate Jira server for this macro. It may be due to Application Link configuration.
3. Integration test opportunities with the switch to TDB - requires startup/shutdown of external Solr ..via Maven
  1. Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Tickets

Status of In-Review tickets

type	key	summary	assignee	reporter	priority	status	resolution	created	updated	due
Unable to locate Jira server for this macro. It may be due to Application Link configuration.

Notes

Draft notes in Google-Doc

Welcome to the VIVO Committers team: William Welling

Working with TaMU VIVO work

i18n Sprint updates

Mini-sprint focused on VIVO committer review/refactor: July 14-16th

Committers will review work from previous sprint

Benjamin will help with code review
Don has also reserved time on the 14th and 17th
Michel will be available during this time if anyone has questions or needs support

Michel: Plans on having good Selenium tests before 14th. Need to make these tests for non-i18n and i18n versions of VIVO and comparing front-ends to ensure that they are exactly the same for the English language version. Ensure some part of the code can be merged into master

https://wiki.lyrasis.org/display/VIVO/vivo-regression-test%3A+a+Test+Bench+Tool+for+the+Continuous+Evaluation+of+VIVO%27s+Development

Another sprint 2: 24th
Benjamin: pull request for Mandarin - wants to do that once i18n branch has been merged into master

Meeting last week with Andrew, Dominique, and Matthias (and Michel)

Discussed the sprint
One important goal fixed: making the merge to the master branch. Not a way to prove that it is stable enough to put in master branch.

Moving priorities forward: Data ingest

What are the use cases?
What the "entities" to be ingested? (reference) ...let's start simple
Thoughts?

Don: Using old VIVO harvester and SPARQL program going from CSV -> constructs to get data into target database

Large amounts of data appear to hang the system. Tried various possibilities
Prevents them from upgrading the system
Might be the only institution using the harvester
Seems like every institution is doing their own brand of updates
Appropriate thing: map data to triples, and then input through SPARQL Update API, with indexer and inferencer turned off
VIVO pump? Don’t appear to be production sites using this.
Mike suggested RML Mapper. Looking at that. Need rule-based mechanism to map data into triples.
YARRML -https://rml.io/yarrrml/matey/# . Kent. YAML version that uses rules to mint subjects and objects. Reads from databases, JSON, triples.

Example:

prefixes:

vitro: "http://vitro.mannlib.cornell.edu/ns/vitro/0.7/"

core: "http://vivoweb.org/ontology/core/"

vlocal: "https://experts.colorado.edu/ontology/vivo-fis/"

vcard: "http://www.w3.org/2006/vcard/ns#"

arg: "http://purl.obolibrary.org/obo/"

cub: "https://experts.colorado.edu/individual/"

mappings:

person:

sources:

- ['fisperson.csv~csv']

s: cub:fisid_$(FISID)

po:

- [a, foaf:Person]

- [a, foaf:Agent]

- [rdfs:label, $(LABEL)]

- [ex:name, $(FIRSTNAME)]

- p: arg:ARG_2000028

o:

- mapping: vcard

condition:

function: equal

parameters:

- [str1, $(FISID)]

- [str2, $(FISID)]

vcard:

sources:

- ['fisperson.csv~csv']

s: cub:vcard_$(FISID)

po:

- [rdf:type, vcard:Kind]

- [rdf:type, arg:ARG_2000379]

- [vitro:mostSpecificType, arg:ARG_2000379]

- [vcard:hasEmail, cub:vcard_name_$(FISID)~iri]

- [vcard:hasName, cub:vcard_name_$(FIS

What are other people using?
Michel: How robust is YARRML?
Don: RML seems more robust. YARRML translates to RML. RML: creates set of rules and applies to input to create triples. Rules written in RDF/turtle.
Main objective: simple way to map the data without having to be an ontologist.

Potential use cases

Data in CSV or other format, need to map into RDF, and use SPARQL Update API

Huda: Anyone remember what RIALTO did and their issues? (We can look up documentation)
Don: SPARQL Update API: performance issues. TDB Loader: VM used for VIVO while separate neede for tdb loader so problematic.
Don: Want to turn on a switch and upload all the triples. Named graph for every class type. Then turn on inferencer and indexer.

Don’t have real-time use case, so turn off one VIVO instance and use another one for upload.

Benjamin: Brian worked on a service for turning off indexing and inference for data ingest. Code may be available on DTU repository

Brian's indexing and inferencing controller service: https://github.com/RAP-research-output-impact/rap-custom-vivo/blob/69afce74333405e532cd348a7758d690e91fdec3/custom-vivo/webapp/src/main/java/dk/dtu/adm/rap/controller/IndexingInferenceService.java

Michel: Consider VIVO as enterprise data source like other sources inside the organization.

Need to have messaging system where we can communicate data/exchange data from different data sources
Evaluating Kafka as a system to do so
Serialization/mapping data from CSV to RDF: will need to address this in the consumer/updater in the Kafka. Want streaming communication from data sources to VIVO

(Don from chat: rdfstreamer seems to work with kafka: https://github.com/RMLio/RMLStreamer)

Architecture/sources discussed in VIVO conference see: http://doi.org/10.13140/RG.2.2.22501.83681

Different use cases with common need for mapping CSV or other data sources to RDF.

Michel: Don’s use case more ETL with VIVO at the center. In our case, integrate across data sources including VIVO.
Don: Open to other ways of bringing in data that doesn’t have to be traditional batch-driven ETL. Streaming messages could serve the same purpose. As long as have enough controls to determine how the app behaves (e.g. how many triples sent over).
Don: Other paradigm: don’t track differences in data. Every class group in own named graph. Easy enough to truncate and reload everything.
Benjamin: Prefer to upload/edit data through the interface

Rachid: Is there a task force for ingest?

Huda: Good idea to have one, to gather use cases, determine common requirements, and work on common goals.
Rachid: No task force yet or interest group yet

RDF messaging

William: Approach is tied to Apache Jena, not agnostic. Standard JMS messaging with Artemis broker - significantly different than Kafka stream. Not interchangeable. Similar in goal.

To do: bring up interest group/task force idea when Andrew is back

2020-07-15 - Special Topic - VIVO Scholar Next Steps

Several people on the call planning on attending
Potential areas of interest

Robust GraphQL specs
Customizable indexes

Similar perhaps to the use of LDPath in Fedora. LDPath traversal for making configurable solr documents
https://marmotta.apache.org/ldpath/language.html
Compound documents are challenging and mapping to Solr documents from them is challenging

Benjamin: Would be nice to get comments on Andrews PR here: https://github.com/vivo-project/Vitro/pull/169

Better way to load only the language RDF you will be using
Rearranges the RDF directory structure

Would put all language files underneath RDF
Non-specific language files in directory named core

Q: Is this better or this worse?
If people have comments, please feel free to add

Space shortcuts

Page tree

Date

Call-in Information

Slack

Attendees

Agenda

Future topics

Tickets

Notes

Actions

Space shortcuts

Page tree

2020-07-07 - VIVO Development IG

Date

Call-in Information

Slack

Attendees

Agenda

Future topics

Tickets

Notes

Actions