Benchmarking MySQL NDB Cluster with ScaleArc using SysBench
While NoSQL and NewSQL systems are maturing as high-performance data store options and being adopted increasingly, relational databases are based on a proven and solid model. Several scalable products still use them and need sharding, caching, routing when their databases grow into large clusters to protect their investment without significant re-engineering and operate 24×7.
Lack of adequate caching is often one of the most common performance problems that performance engineers come across when investigating performance issues. Many performance problems can be solved by the effective application of caching to reduce the frequency of expensive operations like database accesses or fetching web pages or reducing execution count of expensive functions by memoization. Caching can thus be leveraged at all layers from processors to disks, CDNs for web applications to web servers to databases, filesystems etc. They can be as simple as dictionaries/ hash tables provided by programming languages as a data structure or complex in nature like distributed hash tables (DHT) or enterprise grids. However, we use caching when evidence of a bottleneck demands it and not as a golden hammer or a band-aid. Between various caching solutions, It seems that relational caches or caching database middlewares are not too uncommon (see the ‘transparent sharding middleware’ section in this paper on NewSQL systems).
ScaleArc is a database load balancing middleware software, having a long roster of customers and impressive features including zero downtime and real-time monitoring. Of all these features, GS Lab’s performance engineering team thought ScaleArc’s transparent caching could be particularly helpful to improve the performance of products using relational databases and help them meet high scalability goals when combined with the other features. GS Lab engineering came up with a reproducible benchmark as a proof of the pudding to assess its promise.
Tools in the SysBench benchmark suite are widely used to measure the performance of various subsystems. In a recent case investigating suboptimal IO performance, GS Lab used the SysBench I/O benchmark to find that an improper RAID level was being used for a relational database running on expensive hardware. Fixing this to use the correct RAID level led to a big speedup without the need for end-to-end performance testing. SysBench is widely available and suitable as an independently reproducible benchmark. Therefore, we used the SysBench OLTP benchmark in this study to measure the performance of a MySQL NDB cluster with ScaleArc’s ACID compliant cache.
Scalearc has published a similar benchmarking exercise by Percona on Percona’s variant of MySQL. We used NDB cluster since it is used by one of our customers and no such study exists for NDB cluster as far as we know. Also, we only evaluated results of caching for a subset of SysBench OLTP workload consisting of read-only queries (and skipped the read-write workload) to find an upper limit on performance gains through caching.
The results show a big improvement (up to 9x) in throughput of cached read-only queries and a great reduction in response times.
Though the speedup will not be as spectacular for typical OLTP workloads consisting of a mix of reads and writes (compared to analytical workloads with a high percentage of reads), the results are highly promising given that systems can get a big performance boost with zero change in the application code or database.
All the artifacts required to reproduce the exercise including the test environment configuration, load generator code, supporting scripts, raw results and summary data etc. are available in this GitHub repository.