Date
- Time: 9:00 am, Eastern Time (New York, GMT-04:00)
- See in your timezone
Call-in Information
To join the online meeting:
- Go to: https://lyrasis.zoom.us/my/vivo1
One tap mobile:
US: +16699006833,,9358074182# or +19292056099,,9358074182#
Or Telephone:
US: +1 669 900 6833 or +1 929 205 6099 or 877 853 5257
Meeting ID: 935 807 4182
International numbers available: https://zoom.us/u/aeANHanzED
Slack
- https://vivo-project.slack.com
- Self-register at: http://bit.ly/vivo-slack
- Self-register at: http://bit.ly/vivo-slack
Attendees
Indicating note-taker
- Andrew Woods
- Andy Seaborne
- Hunter Jarrell
- Taeber Rapczak
- Brian Lowe
- Don Elsborg
- Benjamin Gross
- Graham Triggs
- Ralph O'Flinn
- Alexander (Sacha) Jerabek
- William Welling
- Douglas C. Hahn
- Steven McCauley
- Kevin Hanson
- Mike Conlon
Objective
Moving towards a decision on VIVO's default triplestore (and community recommendations)
- SDB
- TDB
- External triplestore
Agenda
- Brief introductions: What is your interest in the conversation?
- Pros / Cons of each option (see table in notes)
- Performance characteristics (benchmarks on READ?)
- Reliability
- ACID compliance
- Maintenance implications
- Future-proofing
- Community impact
- Is there a recommendation from this group?
- Follow-on actions
Notes
Recording
Goal to understand differences between SDB/TDB(2). Recommend best practices. Set a default for VIVO.
Andy Seaborne -- answers questions. Apache Jena is an open source project with what that entails.
Andy - settling on TDB2
Can’t go directly into SDB unless you understand how the access works on the lowest levels.
50 million triples(wild guess ) is a practical limit with SDB. It’s the interaction between basic graph patterns and filters.
TDB doesn’t support incremental loading. Massive parallelism is recommended. Set the flags in the bulk-loader. Be sure to try different ones to see which works best for your system/setup.
TDB1 slightly better at small commits. TDB2 at the moment has additional commit overhead to be eventually removed. TDB2 better at large commits -- 200 million added is possible.
Each index loads on a separate thread. Load named graphs in parallel.
Corruption possible across technologies. Bizarre cases. Record what you put in. Dump regularly.
- However, regarding stability, TDB has the most community usage, and therefore is the most bullet-proof
Queries can affect performance greatly. And in some cases the optimize spends measurable time evaluating the query. It’s programming.
Can use TDB and SDB together. Queries in TDB. Data recovery in SDB.
Suggestion regarding "future-proofing": avoid coupling too tightly to any given technology... implement against standards
AWS Neptune isn’t blazegraph. Neptune overwrote the SERVICE calls.
Next steps:
- What are the outstanding questions at this point?
- Should we have a follow-on call (in the new year) to reach a community recommendation?
Actions