...
Indicating note-taker
- Andrew Woods
- Andy Seaborne
- Hunter Jarrell
- Taeber Rapczak
- Brian Lowe
- Don Elsborg
- Benjamin Gross
- Graham Triggs
- Ralph O'Flinn
- Alexander (Sacha) Jerabek
- William Welling
- Douglas C. Hahn
- Steven McCauley
- Kevin Hanson
- Mike Conlon...
Objective
Moving towards a decision on VIVO's default triplestore (and community recommendations)
- SDB
- TDB
- External triplestore
...
Goal to understand differences between SDB/TDB(2). Recommend best practices. Set a default for VIVO.
Andy Seaborne -- answers questions. Apache Jena is an open source project with what that entails.
Andy - settling on TDB2
Can’t go directly into SDB unless you understand how the access works on the lowest levels.
50 million triples(wild guess ) is a practical limit with SDB. It’s the interaction between basic graph patterns and filters.
TDB doesn’t support incremental loading. Massive parallelism is recommended. Set the flags in the bulk-loader. Be sure to try different ones to see which works best for your system/setup.
TDB1 slightly better at small commits. TDB2 at the moment has additional commit overhead to be eventually removed. TDB2 better at large commits -- 200 million added is possible.
Each index loads on a separate thread. Load named graphs in parallel.
Corruption possible across technologies. Bizarre cases. Record what you put in. Dump regularly.
- However, regarding stability, TDB has the most community usage, and therefore is the most bullet-proof
Queries can affect performance greatly. And in some cases the optimize spends measurable time evaluating the query. It’s programming.
Can use TDB and SDB together. Queries in TDB. Data recovery in SDB.
Suggestion regarding "future-proofing": avoid coupling too tightly to any given technology... implement against standards
AWS Neptune isn’t blazegraph. Neptune overwrote the SERVICE calls.
Next steps:
- What are the outstanding questions at this point?
- Should we have a follow-on call (in the new year) to reach a community recommendation?