Thursday, September 27, 2012

Speaking at MySQL Connect This Week-end

I will give a talk at MySQL Connect where I present examples of how MySQL 5.6 improves the performance of many of the queries in the DBT-3 benchmark. I will also be giving a brief description of the relevant new optimization techniques and examples of the types of queries that will benefit from these techniques. My presentation is on Sunday (September 30) at 1.15pm.

I will also like to point you to the other presentations made by member of the MySQL Optimizer team:

Olav Sandstå: MySQL Optimizer Overview (Saturday at 11:30am)
Manyi Lu: Overview of New Optimizer Features in MySQL 5.6 (Saturday at 1:00pm)
Evgeny Potemkin: Powerful EXPLAIN in MySQL 5.6 (Sunday at 4:15pm)

We will also be having a BOF on Saturday evening (7:00 am) where we like people to come and give us some input on which query optimizations they would like us to work on for future releases.

Saturday, July 7, 2012

From Months to Seconds with Subquery Materialization


In an earlier blog post, I showed how optimizer improvements in MySQL 5.6 gave better performance for several of the queries in the DBT-3 benchmark.
However, for one of the queries, Query 18, I was not able to give exact numbers for the improvement since the query took very long in MySQL 5.5. I decided to try to find out exactly how long the query would take, but when the query had run for one month, I gave up. How can a query take so long? Especially, when I had set up InnoDB with a buffer pool that should be large enough to hold the entire database. Let's have a look at the query:

select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_orderkey in (
                select l_orderkey
                from lineitem
                group by l_orderkey
                having sum(l_quantity) > 313
  )
  and c_custkey = o_custkey
  and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 100;

This query will find the orders from customers that have placed big orders. The reason that this takes so long in MySQL 5.5, is that the subquery in the WHERE clause will be executed for each processed row of the table for which this subquery is part of the WHERE predicate. Let's look at the EXPLAIN output for this query:
+----+--------------------+----------+-------+--------------------------------------------+-----------------------+---------+-------------------------+---------+---------------------------------+
| id | select_type        | table    | type  | possible_keys                              | key                   | key_len | ref                     | rows    | Extra                           |
+----+--------------------+----------+-------+--------------------------------------------+-----------------------+---------+-------------------------+---------+---------------------------------+
|  1 | PRIMARY            | customer | ALL   | PRIMARY                                    | NULL                  | NULL    | NULL                    |  150000 | Using temporary; Using filesort | 
|  1 | PRIMARY            | orders   | ref   | PRIMARY,i_o_custkey                        | i_o_custkey           | 5       | dbt3.customer.c_custkey |       7 | Using where                     | 
|  1 | PRIMARY            | lineitem | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4       | dbt3.orders.o_orderkey  |       2 | Using index                     | 
|  2 | DEPENDENT SUBQUERY | lineitem | index | NULL                                       | PRIMARY               | 8       | NULL                    | 6001215 | NULL                            | 
+----+--------------------+----------+-------+--------------------------------------------+-----------------------+---------+-------------------------+---------+---------------------------------+

The select_type for the subquery is DEPENDENT SUBQUERY. Since the left hand side of the IN-expression is a field from the orders table, the subquery will be executed for each row processed from this table. This implies that the index scan of the lineitem table will be performed more than one million times given the rows estimates in the EXPLAIN output (150000*7). I have measured that one execution of the sub-query takes about 3.5 seconds when all the data is in memory. Hence, it will take more than 40 days to execute it one million times.

In MySQL 5.6, Subquery Materialization may be used to avoid the repeated execution of subqueries. This implies that the subquery is executed once and the result stored (materialized) in a temporary table. Then, for each row where the subquery was earlier executed, a hash-based look-up into the temporary table will be made instead to check whether there is a match. The EXPLAIN output for Query 18, looks like this in MySQL 5.6:

+----+-------------+----------+-------+--------------------------------------------+-----------------------+---------+-------------------------+---------+---------------------------------+
| id | select_type | table    | type  | possible_keys                              | key                   | key_len | ref                     | rows    | Extra                           |
+----+-------------+----------+-------+--------------------------------------------+-----------------------+---------+-------------------------+---------+---------------------------------+
|  1 | PRIMARY     | customer | ALL   | PRIMARY                                    | NULL                  | NULL    | NULL                    |  150000 | Using temporary; Using filesort | 
|  1 | PRIMARY     | orders   | ref   | PRIMARY,i_o_custkey                        | i_o_custkey           | 5       | dbt3.customer.c_custkey |       7 | Using where                     | 
|  1 | PRIMARY     | lineitem | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4       | dbt3.orders.o_orderkey  |       2 | Using index                     | 
|  2 | SUBQUERY    | lineitem | index | NULL                                       | PRIMARY               | 8       | NULL                    | 6001215 | NULL                            | 
+----+-------------+----------+-------+--------------------------------------------+-----------------------+---------+-------------------------+---------+---------------------------------+

The only difference is that select_type of the subquery now is SUBQUERY. In other words, it is no longer dependent on the preceding tables, and can be executed only once. In my run of DBT-3 with MySQL 5.6, Query 18 takes only 6.8 seconds. Hence, we have gone from more than a month to just a few seconds! My colleague Guilhem has earlier written about another DBT-3 query that is improved with Subquery Materialization.

Wednesday, April 11, 2012

Improved DBT-3 Results with MySQL 5.6.5

In addition to the Optimizer features added in earlier 5.6 Development Milestone Releases, the newly released MySQL 5.6.5 Development Milestone Release adds a few more. I think this is a good time to check how all these new optimizer features impact the performance of the DBT-3 benchmark queries. In this blog post, I will compare the performance of the DBT-3 queries in MySQL 5.5 and MySQL 5.6.5.

Test Setup

I used a DBT-3 scale 1 database (InnoDB tables) that was stored on a traditional hard disk, and the InnoDB database size was a bit more than 2.5 GB. The DBT-3 queries were run in two settings: a disk-bound setting with a very small InnoDB buffer pool (50 MB), and CPU-bound with a 3 GB InnoDB buffer pool (all data in memory). The query cache was disabled, and by setting --innodb_flush_method=O_DIRECT, the file buffering in the file system was also out of the way. If not stated otherwise, default settings were used for all other MySQL system variables.

For the disk-bound scenario, each query were executed 10 times in a row, and the results presented are the average of the last 8 runs. Since the buffer pool was very small, and there was no buffering in the file system, there were very little difference between the first and the tenth execution of the queries. The only exceptions were Query 16 and Query 22 where the working data sets were so small that subsequent executions were not disk-bound. Due to this, I have not included the result for these queries in the presentation of the disk-bound scenario.

For the CPU-bound scenario, each query were executed 20 times in a row, but the results presented are still the average of the last 8 runs. (The reason for this, is that I observed that for several of the queries, the 8th or so execution of the query took significantly longer than the other executions.)

The order in which the different queries were run were randomized, but the same order was used in all experiments.

In order to get stable execution plans, I enabled InnoDB Persistent Statistics and recorded exact statistics in the statistics tables as discussed here. Since Persistent Statistics is not available in 5.5, I used optimizer hints to force a similar join order and index usage as for 5.6 where necessary.

For MySQL 5.6.5, the DBT-3 queries were run both with default settings for the optimizer_switch variable, and with setting optimizer_switch='mrr_cost_based=off,batched_key_access=on' in order to activate the Disk-Sweep Multi-Range Read (DS-MRR)and Batched Key Access (BKA) features.

OK, enough talking, let's move on to the results.

Disk-Bound Workload

The below chart shows the execution times for the DBT-3 queries in a disk-bound setting. The executions times in MySQL 5.5, MySQL 5.6.5 with default optimizer_switch settings, and MySQL 5.6.5 with DS-MRR and BKA activated are compared. The execution times for MySQL 5.6.5 default is set to 1, and the relative execution times are shown for the other two variants.

Except for Query 18, there are no significant differences between MySQL 5.5 and MySQL 5.6.5 with default settings. However, for Query 18 the improvement is dramatic. While it takes days to execute this query in MySQL 5.5, it takes less 45 minutes in MySQL 5.6.5. This improvement is due to Subquery Materialization, and I will discuss this further in a later blog post.

As discussed in an earlier blog post, BKA can give very big improvements when the workload is disk-bound. This is illustrated by the improvements of queries 2, 5, 8, 9, 13, 18, 19, and 21. Note also that there are some queries that does not benefit from BKA. As discussed in the aforementioned blog, Query 17 gets better locality for the disk accesses without BKA, and turning on BKA makes it take 16 times longer to execute. We also see that Query 11 performs worse with BKA. I plan to investigate further the reason for this.

Queries 3, 4, 10, 14, and 15, are improved by using the DS-MRR algorithm when doing range access. That is, the rows in the base table are not accessed in the order given by the index used for range access, but in sequential order as viewed from the clustered primary key index.

CPU-Bound Workload

The below chart shows the execution times for the DBT-3 queries in a CPU-bound setting. As above the executions times in MySQL 5.5, MySQL 5.6.5 with default optimizer_switch settings, and MySQL 5.6.5 with DS-MRR and BKA activated are compared.

For MySQL 5.6.5 with default settings, the most significant improvement is for Query 18 also with a CPU-bound workload. In addition, there is a 10% improvement for Query 16. This is also due to Subquery Materialization, and my colleague Guilhem discusses this improvement in more detail.

As discussed in my previous blog post on BKA, several queries perform worse with BKA in a CPU-bound setting (e.g., queries 2, 11, 13, and 17). Only query 18 performs better with BKA in this setting.

On the other hand, many of the queries that benefited from DS-MRR with a disk-bound workload, still show some improvement with a CPU-bound workload.

Conclusions

The results from comparing the new MySQL 5.6.5 release with MySQL 5.5, show that Subquery Materialization have significant effects on the execution time of the few DBT-3 queries where it applies.

Also, as shown earlier, BKA has a good effect for disk-bound workloads, while it in many cases will cause worse performance for CPU-bound workloads.

Disk-sweep MRR has a good effect on the performace of range scans in for disk-bound workloads, and it also shows a small improvement for many queries with CPU-bound workloads.