Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagetext
join_buffer_size = 32M
read_rdrnd_buffer_size = 32M
innodb_buffer_pool_size = 512M1536M
innodb_file_per_table = 1
innodb_file_format = barracuda

...

Info

There are a number of different philosophies for sizing the innodb_buffer_pool_size. On a dedicated server, MySQL recommends 70% of the available memory. If you are hosting Tomcat on the same server, then maybe that should be no more that 70% of the system memory minus the max Tomcat heap size.

Actual requirements depends on data shape and usage patterns - for the SDB layout, after accessing a number of different pages in VIVO and checking the free buffers, a good rule of thumb appears setting seems to be roughly 1G around 1MB for every 5 million triples (rows in the Quads table)16,000 triples - e.g. 1.5GB for a 25million triple store. If you have the resources, increasing the buffer size doesn't hurt performance - in which case allocating at least 100MB per 1million triples would be a good idea.

The InnoDB buffer pool can be pre-loaded at startup, which would help the initial query performance times - see: https://dev.mysql.com/doc/refman/5.7/en/innodb-preload-buffer-pool.html

...

Code Block
languagetext
max_heap_table_size=256M

tmp_table_size=256M

It's important to note that this is not a recommendation for a temporary table size, we're just testing the hypothesis - we may be able to get away with smaller. Alternatively, your system may need larger.

...

Info
titleUnused Tweaks

Reading the documentation on MySQL, there is in 5.6 a feature called "batched key access", which on the surface should help with large joins. However, when tested on this query, it appeared to make no difference - there was no change in either execution plan (it would say Using join buffer (Batched Key Access)) or in the execution time. But there may be other queries that we execute where it could come into play. See https://dev.mysql.com/doc/refman/5.6/en/bnl-bka-optimization.html

Even with a large query with OPTIONALs (e.g. the listview-authorInAuthorship select [view the Author in Authorship Query as SQL]), MySQL can't be provoked into using batched key access on a join. It does however start doing batched nested loops. This isn't hideous - under 4 seconds on a 1700+ publication profile - but not quite as fast as Virtuoso (2 seconds).

Despite the SDB layout using hashes for keys, and all joins based on equivalence, the architecture of MySQL is quite heavily leveraged on BTREE indexes, rather than hash indexes. MariaDB (a MySQL fork) adds batched hash joins to those offered by MySQL - although when MariaDB was provoked into using them, observed performance was awful.

MariaDB uses Aria engine for temporary tables by default, in those cases setting the aria_pagecache_buffer_size can make a profound difference. For the authorship query, execution time dropped as low as 2.5 seconds for repeated executions.

Returning to Map of Science

...

Originally, "out of the box", the cached model took a little over 2 minutes to build. With the new settings in place - and no other changes - it takes just 1 minute 16 25 seconds. 

Conclusion

Despite the bad press, it is evident that you can make a LOT of difference to the performance of SDB by taking time to look at optimisations that can be made to the SQL engine. That's the trade-off of using a general purpose SQL core instead of an engine that already knows what data structures and queries it is trying to optimise for. But there is a lot of scope for improving performance simply by tuning the core, without affecting the application or the SPARQL queries that it is using.

...