Wednesday, February 28, 2018

FOSDEM: Histogram support in MySQL 8.0


The slides and video from my FOSDEM presentation on histogram support in MySQL can now be downloaded.  Check it out if you want to learn more about how you can create histograms to improve query plans in MySQL 8.0.

The presentation shows you how to create histograms over column values, 
presents the types of histograms you can create, and discusses best practices for using histograms.

Prior to MySQL 8.0, the query optimizer lacked statistics on columns that are not indexed.  With histograms the query optimizer will be able to make more accurate selectivity estimates for conditions on such columns.   With better estimates, the query optimizer can produce better query plans, especially for join queries.  Check out the presentation for a couple of examples of join queries that runs faster due to histograms!


Thursday, February 1, 2018

Speaking at FOSDEM

I will be speaking at FOSDEM the coming Sunday (February 4) on Histogram support in MySQL 8.0. If you are at FOSDEM, and want to learn about how you can use histograms to improve your query execution plans, visit the MySQL and Friends devroom at 11:10.

Also, please, checkout the entire program for the MySQL devroom.  It is full of interesting talks.

Friday, January 26, 2018

Query Optimizer Takes Data Buffering into Account

Last month I published a blog post at mysqlserverteam.com about how the Query Optimizer in MySQL 8.0 takes advantage of that InnoDB provides buffer estimates per table and index. In that blog post I showed how we got different query plans for Query 8 of the DBT-3 benchmark depending on whether the data was cached InnoDB's buffer pool or not.

In MySQL 5.6, we got a query plan that was well suited for a disk-bound scenario, while MySQL 5.7 switched to a query plan that reduces the execution time by 90% when all data is in memory. However, that came at the expense of longer execution times when data had to be read from disk. In MySQL 8.0 you get the best of both worlds. One query plan will be used when all data is in memory, and another plan will be used when most data need to be fetched from disk.

DBT-3 Query 21

I earlier blogged about another DBT-3 query, Query 21. This query showed a performance regression from MySQL 5.6 to MySQL 5.7 because the query optimizer overestimated the filtering effect of the WHERE clause. In that blog post, I promised that this can be avoided in MySQL 8.0 since you can create histograms to improve the filtering estimates. It turns out that the problem is avoided even without histograms.

In MySQL 5.7 we got this query plan for Query 21:

As discussed in the previous blog post, in MySQL 5.7, the optimizer chooses to start with table orders because it overestimates the filtering on this table. In MySQL 8.0 we still get the above query plan when all the data needs to be read from disk. However, if all data is in memory we get the following query plan:

As you can see from the diagram, the join order has been reversed. Using this new query plan reduces the execution time by 85% when all data is in memory. However, since this plan uses secondary indexes for two of the tables, it is not a good plan when most data has to be read from disk. In fact, if the optimizer had picked this plan in a disk-bound scenario, the query would have been 2.5 times slower than with the plan from 5.7. Hence, by taking into account whether data is in memory or needs to be read from disk, the optimizer is able to find a good plan for Query 21 in both scenarios.

Caveat

One thing to be aware of with this new optimizer behavior, is that running EXPLAIN on a query after it was executed, will not necessarily show the query plan that was used during execution. The previous execution of the query may have changed the content of the buffer pool. Hence, the optimizer may pick a different query plan for the next execution of the query.