Date
Call-in Information
Time: 11:00 am, Eastern Time (New York, GMT-04:00)
To join the online meeting:
- Go to: https://lyrasis.zoom.us/my/vivo1
One tap mobile:
US: +16699006833,,9358074182# or +19292056099,,9358074182#
Or Telephone:
US: +1 669 900 6833 or +1 929 205 6099 or 877 853 5257
Meeting ID: 935 807 4182
International numbers available: https://zoom.us/u/aeANHanzED
Slack
- https://vivo-project.slack.com
- Self-register at: http://bit.ly/vivo-slack
- Self-register at: http://bit.ly/vivo-slack
Attendees
Indicating note-taker
Agenda
- Welcome to the VIVO Committers team: William Welling
- i18n Sprint updates
- Mini-sprint focused on VIVO committer review/refactor: July 14-16th
- Moving priorities forward: Data ingest
- What are the use cases?
- What the "entities" to be ingested? (reference) ...let's start simple
- 2020-07-15 - Special Topic - VIVO Scholar Next Steps
- ...
Future topics
- Vitro JMS messaging approaches - redux
- Which architectural pattern should we take?
- What should the body of the messages be?
- Incremental development initiatives
- Integration test opportunities with the switch to TDB - requires startup/shutdown of external Solr ..via Maven
Tickets
Status of In-Review tickets
Notes
- Welcome to the VIVO Committers team: William Welling
- Working with TaMU VIVO work
- i18n Sprint updates
- Mini-sprint focused on VIVO committer review/refactor: July 14-16th
- Committers will review work from previous sprint
- Benjamin will help with code review
- Don has also reserved time on the 14th and 17th
- Michel will be available during this time if anyone has questions or needs support
- Michel: Plans on having good Selenium tests before 14th. Need to make these tests for non-i18n and i18n versions of VIVO and comparing front-ends to ensure that they are exactly the same for the English language version. Ensure some part of the code can be merged into master
- Another sprint 2: 24th
- Benjamin: pull request for Mandarin - wants to do that once i18n branch has been merged into master
- Meeting last week with Andrew, Dominique, and Matthias (and Michel)
- Discussed the sprint
- One important goal fixed: making the merge to the master branch. Not a way to prove that it is stable enough to put in master branch.
- Moving priorities forward: Data ingest
- What are the use cases?
- What the "entities" to be ingested? (reference) ...let's start simple
- Thoughts?
- Don: Using old VIVO harvester and SPARQL program going from CSV -> constructs to get data into target database
- Large amounts of data appear to hang the system. Tried various possibilities
- Prevents them from upgrading the system
- Might be the only institution using the harvester
- Seems like every institution is doing their own brand of updates
- Appropriate thing: map data to triples, and then input through SPARQL Update API, with indexer and inferencer turned off
- VIVO pump? Don’t appear to be production sites using this.
- Mike suggested RML Mapper. Looking at that. Need rule-based mechanism to map data into triples.
- YARRML -https://rml.io/yarrrml/matey/# . Kent. YAML version that uses rules to mint subjects and objects. Reads from databases, JSON, triples.
- Example:
prefixes: vitro: "http://vitro.mannlib.cornell.edu/ns/vitro/0.7/" core: "http://vivoweb.org/ontology/core/" vlocal: "https://experts.colorado.edu/ontology/vivo-fis/" vcard: "http://www.w3.org/2006/vcard/ns#" arg: "http://purl.obolibrary.org/obo/" cub: "https://experts.colorado.edu/individual/" mappings: person: sources: - ['fisperson.csv~csv'] s: cub:fisid_$(FISID) po: - [a, foaf:Person] - [a, foaf:Agent] - [rdfs:label, $(LABEL)] - [ex:name, $(FIRSTNAME)] - p: arg:ARG_2000028 o: - mapping: vcard condition: function: equal parameters: - [str1, $(FISID)] - [str2, $(FISID)] vcard: sources: - ['fisperson.csv~csv'] s: cub:vcard_$(FISID) po: - [rdf:type, vcard:Kind] - [rdf:type, arg:ARG_2000379] - [vitro:mostSpecificType, arg:ARG_2000379] - [vcard:hasEmail, cub:vcard_name_$(FISID)~iri] - [vcard:hasName, cub:vcard_name_$(FIS |
- What are other people using?
- Michel: How robust is YARRML?
- Don: RML seems more robust. YARRML translates to RML. RML: creates set of rules and applies to input to create triples. Rules written in RDF/turtle.
- Main objective: simple way to map the data without having to be an ontologist.
- Potential use cases
- Data in CSV or other format, need to map into RDF, and use SPARQL Update API
- Huda: Anyone remember what RIALTO did and their issues? (We can look up documentation)
- Don: SPARQL Update API: performance issues. TDB Loader: VM used for VIVO while separate neede for tdb loader so problematic.
- Don: Want to turn on a switch and upload all the triples. Named graph for every class type. Then turn on inferencer and indexer.
- Don’t have real-time use case, so turn off one VIVO instance and use another one for upload.
- Benjamin: Brian worked on a service for turning off indexing and inference for data ingest. Code may be available on DTU repository
- Brian's indexing and inferencing controller service: https://github.com/RAP-research-output-impact/rap-custom-vivo/blob/69afce74333405e532cd348a7758d690e91fdec3/custom-vivo/webapp/src/main/java/dk/dtu/adm/rap/controller/IndexingInferenceService.java
- Michel: Consider VIVO as enterprise data source like other sources inside the organization.
- Need to have messaging system where we can communicate data/exchange data from different data sources
- Evaluating Kafka as a system to do so
- Serialization/mapping data from CSV to RDF: will need to address this in the consumer/updater in the Kafka. Want streaming communication from data sources to VIVO
- (Don from chat: rdfstreamer seems to work with kafka: https://github.com/RMLio/RMLStreamer)
- Architecture/sources discussed in VIVO conference see: http://doi.org/10.13140/RG.2.2.22501.83681
- Different use cases with common need for mapping CSV or other data sources to RDF.
- Michel: Don’s use case more ETL with VIVO at the center. In our case, integrate across data sources including VIVO.
- Don: Open to other ways of bringing in data that doesn’t have to be traditional batch-driven ETL. Streaming messages could serve the same purpose. As long as have enough controls to determine how the app behaves (e.g. how many triples sent over).
- Don: Other paradigm: don’t track differences in data. Every class group in own named graph. Easy enough to truncate and reload everything.
- Benjamin: Prefer to upload/edit data through the interface
- Rachid: Is there a task force for ingest?
- Huda: Good idea to have one, to gather use cases, determine common requirements, and work on common goals.
- Rachid: No task force yet or interest group yet
- RDF messaging
- William: Approach is tied to Apache Jena, not agnostic. Standard JMS messaging with Artemis broker - significantly different than Kafka stream. Not interchangeable. Similar in goal.
- To do: bring up interest group/task force idea when Andrew is back
- 2020-07-15 - Special Topic - VIVO Scholar Next Steps
- Several people on the call planning on attending
- Potential areas of interest
- Robust GraphQL specs
- Customizable indexes
- Similar perhaps to the use of LDPath in Fedora. LDPath traversal for making configurable solr documents
- https://marmotta.apache.org/ldpath/language.html
- Compound documents are challenging and mapping to Solr documents from them is challenging
- Benjamin: Would be nice to get comments on Andrews PR here: https://github.com/vivo-project/Vitro/pull/169
- Better way to load only the language RDF you will be using
- Rearranges the RDF directory structure
- Would put all language files underneath RDF
- Non-specific language files in directory named core
- Q: Is this better or this worse?
- If people have comments, please feel free to add
Actions