Indicating note-taker

Objective

Moving towards a decision on VIVO's default triplestore (and community recommendations)
1. SDB
2. TDB
3. External triplestore

...

Goal to understand differences between SDB/TDB(2). Recommend best practices. Set a default for VIVO.

Andy Seaborne -- answers questions. Apache Jena is an open source project with what that entails.

Andy - settling on TDB2

Can’t go directly into SDB unless you understand how the access works on the lowest levels.

50 million triples(wild guess ) is a practical limit with SDB. It’s the interaction between basic graph patterns and filters.

TDB doesn’t support incremental loading. Massive parallelism is recommended. Set the flags in the bulk-loader. Be sure to try different ones to see which works best for your system/setup.

TDB1 slightly better at small commits. TDB2 at the moment has additional commit overhead to be eventually removed. TDB2 better at large commits -- 200 million added is possible.

Each index loads on a separate thread. Load named graphs in parallel.

Corruption possible across technologies. Bizarre cases. Record what you put in. Dump regularly.

However, regarding stability, TDB has the most community usage, and therefore is the most bullet-proof

Queries can affect performance greatly. And in some cases the optimize spends measurable time evaluating the query. It’s programming.

Can use TDB and SDB together. Queries in TDB. Data recovery in SDB.

Suggestion regarding "future-proofing": avoid coupling too tightly to any given technology... implement against standards

AWS Neptune isn’t blazegraph. Neptune overwrote the SERVICE calls.

Next steps:

What are the outstanding questions at this point?
Should we have a follow-on call (in the new year) to reach a community recommendation?

Space shortcuts

Page tree

Versions Compared

Old Version 3

New Version 4

Key

Objective

Actions

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 3

New Version 4

Key

Objective

Actions