Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It's important to note that this is not a recommendation for a temporary table size, we're just testing the hypothesis - we may be able to get away with smaller. Alternatively, your system may need larger.

 

But for now, we've put in a big number, which should be sufficient for the query above. So how long does the query take?

SIX SECONDS.

From cold. Repeat the query with warmed caches, and it dips under 5 seconds. Even with the DISTINCT present. For a query that took 20 seconds originally. 

Info
titleUnused Tweaks

Reading the documentation on MySQL, there is in 5.6 a feature called "batched key access", which on the surface should help with large joins. However, when tested on this query, it appeared to make no difference - there was no change in either execution plan (it would say Using join buffer (Batched Key Access)) or in the execution time. But there may be other queries that we execute where it could come into play. See https://dev.mysql.com/doc/refman/5.6/en/bnl-bka-optimization.html

Returning to Map of Science

Without an in depth look at the queries being executed by the Map of Science / Temporal Graph, what difference has the new MySQL settings made?

Originally, "out of the box", the cached model took a little over 2 minutes to build. With the new settings in place - and no other changes - it takes just 1 minute 16 seconds. 

Conclusion

Despite the bad press, it is evident that you can make a LOT of difference to the performance of SDB by taking time to look at optimisations that can be made to the SQL engine. That's the trade-off of using a general purpose SQL core instead of an engine that already knows what data structures and queries it is trying to optimise for. But there is a lot of scope for improving performance simply by tuning the core, without affecting the application or the SPARQL queries that it is using.