Thursday, May 2, 2013

When is a Subquery Executed?

In an earlier blog post, I managed to confuse myself as to when a subquery was executed. It is not very clear from the output of EXPLAIN where the execution of a subquery takes place. Let's take a look at the following example query (Query 17 in the DBT-3 benchmark):

select sum(l_extendedprice) / 7.0 as avg_yearly
from lineitem, part
where p_partkey = l_partkey
  and p_brand = 'Brand#33' and p_container = 'LG CAN'
  and l_quantity < (
 select 0.2 * avg(l_quantity)
 from lineitem
 where l_partkey = p_partkey

If you run EXPLAIN on this query, you will see the following execution plan:

| id | select_type        | table    | type | possible_keys                   | key                 | key_len | ref                 | rows    | Extra       |
|  1 | PRIMARY            | part     | ALL  | PRIMARY                         | NULL                | NULL    | NULL                | 2000000 | Using where |
|  1 | PRIMARY            | lineitem | ref  | i_l_suppkey_partkey,i_l_partkey | i_l_suppkey_partkey | 5       | dbt3.part.p_partkey |      14 | Using where |
|  2 | DEPENDENT SUBQUERY | lineitem | ref  | i_l_suppkey_partkey,i_l_partkey | i_l_suppkey_partkey | 5       | dbt3.part.p_partkey |      14 | NULL        |

It is not straightforward to determine from this plan when the subquery will be executed. We see that both tables of the join has "Using where" in the Extra column. In other words, the original WHERE clause has been split between the two tables. The question is: Which part contains the subquery?

In this specific case, the answer is pretty evident when looking closer at the query. The subquery is part of an expression which refers to a column of the lineitem table, l_quantity, and within the subquery, there is also also a reference to p_partkey of the part table. In other words, columns from both tables of the join is needed when executing the subqquery. Hence, the subquery will be part of the where clause attached to the last table of the join.

This means that the above query will, for each row in the part table that satisfies the condition attached to the part table, look up matching rows in the lineitem table, and, for every matching row, execute the subquery. As you probably realize, one will have to know quite a bit about how the optimizer handles WHERE conditions in order to deduce this. But don't despair! An extension to EXPLAIN in MySQL 5.6 make things much clearer.

Structured EXPLAIN

With MySQL 5.6, you can get a structured version of the query plan in JSON format. All you need to do is to write "EXPLAIN FORMAT=JSON" instead of just EXPLAIN. Let's take a look at what the JSON EXPLAIN output for the query discussed above looks like:

  "query_block": {
    "select_id": 1,
    "nested_loop": [
        "table": {
          "table_name": "part",
          "access_type": "ALL",
          "possible_keys": [
          ] /* possible_keys */,
          "rows": 2000000,
          "filtered": 100,
          "attached_condition": "((`dbt3`.`part`.`p_container` = 'LG CAN') and (`dbt3`.`part`.`p_brand` = 'Brand#33'))"
        } /* table */
        "table": {
          "table_name": "lineitem",
          "access_type": "ref",
          "possible_keys": [
          ] /* possible_keys */,
          "key": "i_l_suppkey_partkey",
          "used_key_parts": [
          ] /* used_key_parts */,
          "key_length": "5",
          "ref": [
          ] /* ref */,
          "rows": 14,
          "filtered": 100,
          "attached_condition": "(`dbt3`.`lineitem`.`l_quantity` < (/* select#2 */ select (0.2 * avg(`dbt3`.`lineitem`.`l_quantity`)) from `dbt3`.`lineitem` where (`dbt3`.`lineitem`.`l_partkey` = `dbt3`.`part`.`p_partkey`)))",
          "attached_subqueries": [
              "dependent": true,
              "cacheable": false,
              "query_block": {
                "select_id": 2,
                "table": {
                  "table_name": "lineitem",
                  "access_type": "ref",
                  "possible_keys": [
                  ] /* possible_keys */,
                  "key": "i_l_suppkey_partkey",
                  "used_key_parts": [
                  ] /* used_key_parts */,
                  "key_length": "5",
                  "ref": [
                  ] /* ref */,
                  "rows": 14,
                  "filtered": 100
                } /* table */
              } /* query_block */
          ] /* attached_subqueries */
        } /* table */
    ] /* nested_loop */
  } /* query_block */

You probably recognize much of the information from traditional EXPLAIN, but there is also some information that traditional EXPLAIN do not show. The "attached_condition" field shows which parts of the WHERE condition is evaluated at which stage of the query execution. That is, the condition attached to a table will be evaluated for each row that is read from this table. In this specific example, it shows that the subquery is part of the condition attached to the lineitem table, and the "attached_subqueries" field for that table contains the execution plan for the subquery.


Structured EXPLAIN, or EXPLAIN in JSON format, is also much more suitable for tools that want to process and display query plans. MySQL Workbench 5.2 uses structural EXPLAIN to visualize MySQL query plans. Below is shown the result of Visual EXPLAIN for the example query used in this blog post:

Note that MySQL Workbench colors the table boxes based on which access method is used. A red box indicates a tables scan, while a green box is used for index look-ups (ref access).

As we all know: "A picture is worth a thousand words", and from the above picture it should be pretty clear when the subquery is executed.

Friday, February 22, 2013

When and How to Take Advantage of New Optimizer Features in MySQL 5.6

A few weeks ago, I made a presentation with the above title at FOSDEM in Brussels. My slides can be found here. The MySQL and Friends devroom had many interesting presentations and the room was full for most of them.

Thursday, September 27, 2012

Speaking at MySQL Connect This Week-end

I will give a talk at MySQL Connect where I present examples of how MySQL 5.6 improves the performance of many of the queries in the DBT-3 benchmark. I will also be giving a brief description of the relevant new optimization techniques and examples of the types of queries that will benefit from these techniques. My presentation is on Sunday (September 30) at 1.15pm.

I will also like to point you to the other presentations made by member of the MySQL Optimizer team:

Olav Sandstå: MySQL Optimizer Overview (Saturday at 11:30am)
Manyi Lu: Overview of New Optimizer Features in MySQL 5.6 (Saturday at 1:00pm)
Evgeny Potemkin: Powerful EXPLAIN in MySQL 5.6 (Sunday at 4:15pm)

We will also be having a BOF on Saturday evening (7:00 am) where we like people to come and give us some input on which query optimizations they would like us to work on for future releases.

Saturday, July 7, 2012

From Months to Seconds with Subquery Materialization

In an earlier blog post, I showed how optimizer improvements in MySQL 5.6 gave better performance for several of the queries in the DBT-3 benchmark.
However, for one of the queries, Query 18, I was not able to give exact numbers for the improvement since the query took very long in MySQL 5.5. I decided to try to find out exactly how long the query would take, but when the query had run for one month, I gave up. How can a query take so long? Especially, when I had set up InnoDB with a buffer pool that should be large enough to hold the entire database. Let's have a look at the query:

select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where o_orderkey in (
                select l_orderkey
                from lineitem
                group by l_orderkey
                having sum(l_quantity) > 313
  and c_custkey = o_custkey
  and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 100;

This query will find the orders from customers that have placed big orders. The reason that this takes so long in MySQL 5.5, is that the subquery in the WHERE clause will be executed for each processed row of the table for which this subquery is part of the WHERE predicate. Let's look at the EXPLAIN output for this query:
| id | select_type        | table    | type  | possible_keys                              | key                   | key_len | ref                     | rows    | Extra                           |
|  1 | PRIMARY            | customer | ALL   | PRIMARY                                    | NULL                  | NULL    | NULL                    |  150000 | Using temporary; Using filesort | 
|  1 | PRIMARY            | orders   | ref   | PRIMARY,i_o_custkey                        | i_o_custkey           | 5       | dbt3.customer.c_custkey |       7 | Using where                     | 
|  1 | PRIMARY            | lineitem | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4       | dbt3.orders.o_orderkey  |       2 | Using index                     | 
|  2 | DEPENDENT SUBQUERY | lineitem | index | NULL                                       | PRIMARY               | 8       | NULL                    | 6001215 | NULL                            | 

The select_type for the subquery is DEPENDENT SUBQUERY. Since the left hand side of the IN-expression is a field from the orders table, the subquery will be executed for each row processed from this table. This implies that the index scan of the lineitem table will be performed more than one million times given the rows estimates in the EXPLAIN output (150000*7). I have measured that one execution of the sub-query takes about 3.5 seconds when all the data is in memory. Hence, it will take more than 40 days to execute it one million times.

In MySQL 5.6, Subquery Materialization may be used to avoid the repeated execution of subqueries. This implies that the subquery is executed once and the result stored (materialized) in a temporary table. Then, for each row where the subquery was earlier executed, a hash-based look-up into the temporary table will be made instead to check whether there is a match. The EXPLAIN output for Query 18, looks like this in MySQL 5.6:

| id | select_type | table    | type  | possible_keys                              | key                   | key_len | ref                     | rows    | Extra                           |
|  1 | PRIMARY     | customer | ALL   | PRIMARY                                    | NULL                  | NULL    | NULL                    |  150000 | Using temporary; Using filesort | 
|  1 | PRIMARY     | orders   | ref   | PRIMARY,i_o_custkey                        | i_o_custkey           | 5       | dbt3.customer.c_custkey |       7 | Using where                     | 
|  1 | PRIMARY     | lineitem | ref   | PRIMARY,i_l_orderkey,i_l_orderkey_quantity | i_l_orderkey_quantity | 4       | dbt3.orders.o_orderkey  |       2 | Using index                     | 
|  2 | SUBQUERY    | lineitem | index | NULL                                       | PRIMARY               | 8       | NULL                    | 6001215 | NULL                            | 

The only difference is that select_type of the subquery now is SUBQUERY. In other words, it is no longer dependent on the preceding tables, and can be executed only once. In my run of DBT-3 with MySQL 5.6, Query 18 takes only 6.8 seconds. Hence, we have gone from more than a month to just a few seconds! My colleague Guilhem has earlier written about another DBT-3 query that is improved with Subquery Materialization.

Wednesday, April 11, 2012

Improved DBT-3 Results with MySQL 5.6.5

In addition to the Optimizer features added in earlier 5.6 Development Milestone Releases, the newly released MySQL 5.6.5 Development Milestone Release adds a few more. I think this is a good time to check how all these new optimizer features impact the performance of the DBT-3 benchmark queries. In this blog post, I will compare the performance of the DBT-3 queries in MySQL 5.5 and MySQL 5.6.5.

Test Setup

I used a DBT-3 scale 1 database (InnoDB tables) that was stored on a traditional hard disk, and the InnoDB database size was a bit more than 2.5 GB. The DBT-3 queries were run in two settings: a disk-bound setting with a very small InnoDB buffer pool (50 MB), and CPU-bound with a 3 GB InnoDB buffer pool (all data in memory). The query cache was disabled, and by setting --innodb_flush_method=O_DIRECT, the file buffering in the file system was also out of the way. If not stated otherwise, default settings were used for all other MySQL system variables.

For the disk-bound scenario, each query were executed 10 times in a row, and the results presented are the average of the last 8 runs. Since the buffer pool was very small, and there was no buffering in the file system, there were very little difference between the first and the tenth execution of the queries. The only exceptions were Query 16 and Query 22 where the working data sets were so small that subsequent executions were not disk-bound. Due to this, I have not included the result for these queries in the presentation of the disk-bound scenario.

For the CPU-bound scenario, each query were executed 20 times in a row, but the results presented are still the average of the last 8 runs. (The reason for this, is that I observed that for several of the queries, the 8th or so execution of the query took significantly longer than the other executions.)

The order in which the different queries were run were randomized, but the same order was used in all experiments.

In order to get stable execution plans, I enabled InnoDB Persistent Statistics and recorded exact statistics in the statistics tables as discussed here. Since Persistent Statistics is not available in 5.5, I used optimizer hints to force a similar join order and index usage as for 5.6 where necessary.

For MySQL 5.6.5, the DBT-3 queries were run both with default settings for the optimizer_switch variable, and with setting optimizer_switch='mrr_cost_based=off,batched_key_access=on' in order to activate the Disk-Sweep Multi-Range Read (DS-MRR)and Batched Key Access (BKA) features.

OK, enough talking, let's move on to the results.

Disk-Bound Workload

The below chart shows the execution times for the DBT-3 queries in a disk-bound setting. The executions times in MySQL 5.5, MySQL 5.6.5 with default optimizer_switch settings, and MySQL 5.6.5 with DS-MRR and BKA activated are compared. The execution times for MySQL 5.6.5 default is set to 1, and the relative execution times are shown for the other two variants.

Except for Query 18, there are no significant differences between MySQL 5.5 and MySQL 5.6.5 with default settings. However, for Query 18 the improvement is dramatic. While it takes days to execute this query in MySQL 5.5, it takes less 45 minutes in MySQL 5.6.5. This improvement is due to Subquery Materialization, and I will discuss this further in a later blog post.

As discussed in an earlier blog post, BKA can give very big improvements when the workload is disk-bound. This is illustrated by the improvements of queries 2, 5, 8, 9, 13, 18, 19, and 21. Note also that there are some queries that does not benefit from BKA. As discussed in the aforementioned blog, Query 17 gets better locality for the disk accesses without BKA, and turning on BKA makes it take 16 times longer to execute. We also see that Query 11 performs worse with BKA. I plan to investigate further the reason for this.

Queries 3, 4, 10, 14, and 15, are improved by using the DS-MRR algorithm when doing range access. That is, the rows in the base table are not accessed in the order given by the index used for range access, but in sequential order as viewed from the clustered primary key index.

CPU-Bound Workload

The below chart shows the execution times for the DBT-3 queries in a CPU-bound setting. As above the executions times in MySQL 5.5, MySQL 5.6.5 with default optimizer_switch settings, and MySQL 5.6.5 with DS-MRR and BKA activated are compared.

For MySQL 5.6.5 with default settings, the most significant improvement is for Query 18 also with a CPU-bound workload. In addition, there is a 10% improvement for Query 16. This is also due to Subquery Materialization, and my colleague Guilhem discusses this improvement in more detail.

As discussed in my previous blog post on BKA, several queries perform worse with BKA in a CPU-bound setting (e.g., queries 2, 11, 13, and 17). Only query 18 performs better with BKA in this setting.

On the other hand, many of the queries that benefited from DS-MRR with a disk-bound workload, still show some improvement with a CPU-bound workload.


The results from comparing the new MySQL 5.6.5 release with MySQL 5.5, show that Subquery Materialization have significant effects on the execution time of the few DBT-3 queries where it applies.

Also, as shown earlier, BKA has a good effect for disk-bound workloads, while it in many cases will cause worse performance for CPU-bound workloads.

Disk-sweep MRR has a good effect on the performace of range scans in for disk-bound workloads, and it also shows a small improvement for many queries with CPU-bound workloads.

Monday, October 3, 2011

Batched Key Access Speeds Up Disk-Bound Join Queries

A new feature in MySQL 5.6.3 Development Milestone Release is Batched Key Access (BKA). BKA can be applied when an index lookup can be used to execute a join query. One example of such a query is:

mysql> SELECT * FROM customer JOIN orders ON c_custkey = o_custkey;

Given that the customer table is significantly smaller than the orders table, and assuming that we have an index on the o_custkey column of orders, MySQL have traditionally executed this join as follows: Scan the entire customer table, and for each row in the customer table, do an index look-up into the orders table to find matching rows. This strategy is know as Index Nested Loops Join, or as MySQL EXPLAIN puts it:

| id | select_type | table    | type | possible_keys | key         | key_len | ref                     | rows   | Extra |
|  1 | SIMPLE      | customer | ALL  | PRIMARY       | NULL        | NULL    | NULL                    | 150000 |       |
|  1 | SIMPLE      | orders   | ref  | i_o_custkey   | i_o_custkey | 5       | dbt3.customer.c_custkey |      7 |       |

If BKA is applied, MySQL will instead buffer keys to be used for index look-up in the join buffer, and each time the join buffer is full, it will do a batched look-up on the index. That is, the whole set of keys from the join buffer is sent to the storage engine in one batch. For this purpose, MySQL Server uses the Multi-Range Read interface of the Storage Engine API.

Multi-Range Read (MRR)

The advantage of requesting a batch of rows from the storage engine, is that the storage engine may be able to deliver the rows more efficiently that way. The MRR interface has for some time been used in MySQL Cluster since it reduces communication between nodes by making it possible to request more than one row per round-trip.

A new feature in MySQL 5.6.2 was Disk-Sweep Multi-Range Read (DS-MRR) optimizations for both InnoDB and MyISAM. When DS-MRR is applied, the storage engine will access the data rows in the order given by the base table; instead of in the order given by the index. For a disk-bound query this has two advantages:
  1. All interesting rows on the same disk page will be accessed together. Hence, one do not risk that a page has been removed from the buffer pool between two accesses to the page.
  2. Even if modern disks/file systems do not give any guarantees, accessing the pages in table order, will normally give close to sequential access to the disk. As we know, that should give much better performance compared to random access. (If multiple sessions access the disks simultaneously, the advantage of sequential access within a session will of course be less significant.)

Turning On Batched Key Access

While BKA is advantageous in a disk-bound system, it may actually reduce performance in a CPU-bound setting. (More on this below.) Since the MySQL Optimizer currently has no information on whether a table resides in memory or on disk, BKA is off by default.

You can turn on BKA by setting an optimizer_switch flag. Since BKA depends on MRR, the MRR flag also needs to be on. (This is default.) In addition, since the cost model for MRR is currently way too conservative, in order to use MRR, cost-based MRR optimization needs to be turned off. In other words, the following settings must be done in order for the Optimizer to consider BKA:

mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';

With BKA turned on, EXPLAIN for the query presented earlier, will look like this:

| id | select_type | table    | type | possible_keys | key         | key_len | ref                     | rows   | Extra                                  |
|  1 | SIMPLE      | customer | ALL  | PRIMARY       | NULL        | NULL    | NULL                    | 150000 |                                        |
|  1 | SIMPLE      | orders   | ref  | i_o_custkey   | i_o_custkey | 5       | dbt3.customer.c_custkey |      7 | Using join buffer (Batched Key Access) |

Performance Impact of BKA on DBT-3 Queries

The chart below shows query execution times, with and without BKA, for all the queries of the DBT-3 benchmark where BKA can be used. The size of the database is DBT-3 scale 1 (4 GB), and in order to get disk bound queries, the size of the InnoDB buffer pool is just 50 MB. We see that, for most of the queries, the run time with BKA is around half the run time without BKA .

Note that Query 17 is an example of a query where it is not beneficial to use BKA even when the query is disk-bound. In that particular case, locality of accesses are very good without BKA, and applying BKA actually spreads accesses to related rows in time, decreasing the buffer cache hit ratio.

Effects of Increasing the Join Buffer Size

As mention above, the size of the batches used by BKA is determined by the size of the join buffer. It follows from the discussion above that the larger the batch, the more efficiently the storage engine may arrange its disk accesses. The results presented above were achieved with a default setting of join_buffer_size (128 kB). We will now look closer at what happens if we increase the size of the join buffer. For this purpose, we will use Query 13 (Customer Distribution Query) from the DBT-3 benchmark:

SELECT c_count, COUNT(*) AS custdist
      SELECT c_custkey, COUNT(o_orderkey) AS c_count
      FROM customer LEFT OUTER JOIN orders ON
        c_custkey = o_custkey AND o_comment NOT LIKE '%express%requests%'
      GROUP BY c_custkey
     ) AS c_orders 
GROUP BY c_count
ORDER BY custdist DESC, c_count DESC;

The heart of this query is the join between customer table and orders table:

| id | select_type | table      | type  | possible_keys | key           | key_len | ref                     | rows   | Extra                                               |
|  1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL          | NULL    | NULL                    | 1050000 | Using temporary; Using filesort                     |
|  2 | DERIVED     | customer   | index | NULL          | i_c_nationkey | 5       | NULL                    |  150000 | Using index; Using temporary; Using filesort        |
|  2 | DERIVED     | orders     | ref   | i_o_custkey   | i_o_custkey   | 5       | dbt3.customer.c_custkey |       7 | Using where; Using join buffer (Batched Key Access) |

Using BKA for this query, MySQL will fill the join buffer with customer keys and send a batch of keys to the storage engine. The storage engine will find the primary keys for the requested rows in the i_o_custkey index, sort the primary keys, and access the orders table in primary key order. In other words, there will be one pass over the orders table for each batch. The graph below shows the effect of increasing the join buffer on Query 13. (Note the logarithmic scale of the y-axis.) The query execution time goes down from 20 minutes with a default join buffer to less than 10 secs with a 32 MB join buffer!

BKA Performance When Queries are CPU-bound

As discussed above, BKA is designed to speed up disk-bound queries. Now we will look at what happens if BKA is used in a system where all the data in the database reside in main memory. The chart below presents the results of running the same DBT-3 queries with a InnoDB buffer pool of 5 GB. While the difference is not very significant for most queries, we see there are a few queries where the impact of BKA is quite negative.

The negative impact of BKA is not just the overhead of sorting the keys before accessing the table. If you look at the execution plan for QUERY 13 as presented above, it shows that the result of the join needs to be sorted in order to perform the group by operation of the subquery. This would not have been necessary, had not BKA been used. Without BKA, the result from the join would have been delivered in customer key order and grouping could be done without sorting. Hence, in this experiment, the execution time for Query 13 increased from 3.6 to 4.8 seconds when using BKA.

Use with Care

As I have shown, join query execution can benefit significantly from the use of Batched Key Access. However, there is also examples of queries where BKA do more harm than good. Hence, it should be used with care. My advice is to study the BKA performance for your typical queries and to only turn on BKA for queries that are known to benefit from it.

Thursday, May 5, 2011

InnoDB Persistent Statistics Save the Day

In my previous blog posting, I explained how I was able to get more stable query execution times by increasing the amount of sampling used to by InnoDB to calculate statistics. However, for my example query, Query 8 of the DBT-3 benchmark, the MySQL Optimizer still toggled between three different indexes to use when accessing one of the 8 tables.

I decided to try out InnoDB Persistent Statistics that is one of the new features in the recent MySQL 5.6.2 Development Milestone Release. According to the advertisement, this should give both more accurate and more stable statistics. The improved accuracy is achieved by using a more precise sampling algorithm, while the increased stability is achieved by giving the user full control over when statistics are recalculated. That is, the persistent statistics are not recalculated automatically.

In order to activate persistent statistics, you first need to run a script to create the tables to be used to store the statistics. Then, you can enable the feature by setting the system variable innodb_analyze_is_persistent. For the details, see the MySQL 5.6 Manual.

More Accurate Statistics

To investigate whether the new sampling algorithm gives more accurate statistics, I looked at the average and variance in the estimated cardinality of indexed columns over 100 runs of ANALYZE on the DBT-3 tables. The average of the estimates did not change much as it is actually pretty spot-on with the old sampling algorithm. It is the great variance that causes the observed inaccuracies.

The below chart shows the coefficient of variation for the estimated cardinality. (Columns that are listed more than once, are part of multiple indexes. If a bar in the bar chart is not visible, it means that there was no or very little variance between runs for this column.) The bar chart clearly shows that for most indexes, the sampling algorithm for persistent statistics gives less variance than the old sampling algorithm used with transient statistics. (The number of sampled index pages was 100 in both cases.)

More Stable Statistics

As mentioned above, the persistent statistics will be more stable because it is not updated automatically like the old transient statistics. This way, for our performance regression test, we can run ANALYZE once for each table, and all later runs on this database will use the same statistics. Another advantage of being able to control when statistics are recalculated, is that it can be scheduled at times when one can afford to use a higher sampling rate. The disadvantage is that it becomes the burden of the DBA to make sure to regularly initiate a recalculation of statistics to prevent that the statistics become outdated.

InnoDB Statistics Tables

The InnoDB statistics tables are ordinary tables that are created in a database called innodb. There are two tables: table_stats and index_stats that contain per-table and per-index statistics, respectively:

mysql> describe innodb.table_stats;
| Field                    | Type                | Null | Key | Default           | Extra                       |
| database_name            | varchar(64)         | NO   | PRI | NULL              |                             |
| table_name               | varchar(64)         | NO   | PRI | NULL              |                             |
| stats_timestamp          | timestamp           | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| n_rows                   | bigint(20) unsigned | NO   |     | NULL              |                             |
| clustered_index_size     | bigint(20) unsigned | NO   |     | NULL              |                             |
| sum_of_other_index_sizes | bigint(20) unsigned | NO   |     | NULL              |                             |
6 rows in set (0.01 sec)

mysql> describe innodb.index_stats;
| Field            | Type                | Null | Key | Default           | Extra                       |
| database_name    | varchar(64)         | NO   | PRI | NULL              |                             |
| table_name       | varchar(64)         | NO   | PRI | NULL              |                             |
| index_name       | varchar(64)         | NO   | PRI | NULL              |                             |
| stat_timestamp   | timestamp           | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| stat_name        | varchar(64)         | NO   | PRI | NULL              |                             |
| stat_value       | bigint(20) unsigned | NO   |     | NULL              |                             |
| sample_size      | bigint(20) unsigned | YES  |     | NULL              |                             |
| stat_description | varchar(1024)       | NO   |     | NULL              |                             |
8 rows in set (0.00 sec)

table_stats contains one row per table:
mysql> select * from innodb.table_stats where table_name='customer';
| database_name | table_name | stats_timestamp     | n_rows | clustered_index_size | sum_of_other_index_sizes |
| dbt3          | customer   | 2011-04-01 20:53:30 | 149911 |                 1764 |                      225 |

index_stats contains several rows per index, and the column stat_name identifies the type of statistics as described by the content of the stat_description column. The n_diff_pfx_ rows contain the cardinality statistics: For rows with stat_name = 'n_diff_pfx_01', stat_value contains the number of distinct values for the first column of the given index. For rows with stat_name = 'n_diff_pfx_02', stat_value contains the number of unique combinations of the two first columns of the given index, and so on. (Note that InnoDB indexes, in addition to the columns that were specified when the index was created, contain all columns of the primary key.)
mysql> select * from innodb.index_stats where table_name='customer';
| database_name | table_name | index_name    | stat_timestamp      | stat_name    | stat_value | sample_size | stat_description                  |
| dbt3          | customer   | i_c_nationkey | 2011-04-01 11:40:29 | n_diff_pfx01 |         25 |         171 | c_nationkey                       |
| dbt3          | customer   | i_c_nationkey | 2011-04-01 11:40:29 | n_diff_pfx02 |     150000 |         171 | c_nationkey,c_custkey             |
| dbt3          | customer   | i_c_nationkey | 2011-04-01 11:40:29 | n_leaf_pages |        171 |        NULL | Number of leaf pages in the index |
| dbt3          | customer   | i_c_nationkey | 2011-04-01 11:40:29 | size         |        225 |        NULL | Number of pages in the index      |
| dbt3          | customer   | PRIMARY       | 2011-04-01 11:40:29 | n_diff_pfx01 |     149911 |         100 | c_custkey                         |
| dbt3          | customer   | PRIMARY       | 2011-04-01 11:40:29 | n_leaf_pages |       1746 |        NULL | Number of leaf pages in the index |
| dbt3          | customer   | PRIMARY       | 2011-04-01 11:40:29 | size         |       1764 |        NULL | Number of pages in the index      |

Manually Updating InnoDB Statistics Tables

Since the statistics tables are normal tables, they can be updated like any other MySQL table. For our performance regression tests, we have decided to record our own numbers in these tables. That way, we will be able to get the exact same numbers even if we for some reason need to recreate the statistics tables (e.g, in another database instance).

When we manually update the statistics tables for the DBT-3 benchmarks, we first turn on persistent statistics and run ANALYZE once for each table. This inserts all necessary rows in the statistics tables, and we will only need to update the relevant rows. Here are an example UPDATE statement for each table:
UPDATE innodb.table_stats SET n_rows=150000 
  WHERE database_name='dbt3' AND table_name='customer' ;
UPDATE innodb.index_stats SET stat_value=25, sample_size=NULL 
  WHERE database_name='dbt3' AND table_name='customer' 
    AND index_name='i_c_nationkey' AND stat_name='n_diff_pfx01';
By convention, I set index_stats.sample_size to NULL to indicate that the value is recorded and not computed by sampling.

A World of New Possibilities

As discussed above, InnoDB Persistent Statistics will be very useful in order to ensure that the same query execution plan is used every time a query is executed. It also gives the users the ability to control when the statistics are recalculated. However, persistent statistics opens up for even more new possibilities.

We will be able to use this feature to extend our testing of the query optimizer. By "faking" the statistics, we will be able to explore and test the execution of different query plans without having to actually modify the database.

It will also be possible for users to change the statistics in order to force a specific query plan. However, one risks introducing side-effects on other queries so this will have to be done with caution.