MySQL’s “everything is a nested-loop join” approach to query execution isn’t ideal for optimizing every kind of query. Fortunately, there are only a limited number of cases where the MySQL query optimizer does a poor job, and it’s usually possible to rewrite such queries more efficiently.
MySQL sometimes optimizes subqueries very badly. The worst offenders are
-- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 subqueries in the
SELECT * FROM sakila.film WHERE0 clause. As an example, let’s find all films in the Sakila sample database’s
SELECT * FROM sakila.film WHERE1 table whose casts include the actress Penelope Guiness (
SELECT * FROM sakila.film WHERE2). This feels natural to write with a subquery, as follows:
It’s tempting to think that MySQL will execute this query from the inside out, by finding a list of
-- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 list. We said an
-- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 list is generally very fast, so you might expect the query to be optimized to something like this:
Unfortunately, exactly the opposite happens. MySQL tries to “help” the subquery by pushing a correlation into it from the outer table, which it thinks will let the subquery find rows more efficiently. It rewrites the query as follows:
Now the subquery requires the
mysql>0 to see exactly how the query is rewritten):
According to the
Another good optimization is to manually generate the
MySQL has been criticized thoroughly for this particular type of subquery execution plan. Although it definitely needs to be fixed, the criticism often confuses two different issues: execution order and caching. Executing the query from the inside out is one way to optimize it; caching the inner query’s result is another. Rewriting the query yourself lets you take control over both aspects. Future versions of MySQL should be able to optimize this type of query much better, although this is no easy task. There are very bad worst cases for any execution plan, including the inside-out execution plan that some people think would be simple to optimize.
When a correlated subquery is good
MySQL doesn’t always optimize correlated subqueries badly. If you hear advice to always avoid them, don’t listen! Instead, benchmark and make your own decision. Sometimes a correlated subquery is a perfectly reasonable, or even optimal, way to get a result. Let’s look at an example:
The standard advice for this query is to write it as a
The plans are nearly identical, but there are some differences:
So, in theory, MySQL will execute the queries almost identically. In reality, benchmarking is the only way to tell which approach is really faster. We benchmarked both queries on our standard setup. The results are shown in .
Table 4-1. NOT EXISTS versus LEFT OUTER JOIN
Result in queries per second (QPS)
Our benchmark found that the subquery is quite a bit slower!
However, this isn’t always the case. Sometimes a subquery can be faster. For example, it can work well when you just want to see rows from one table that match rows in another table. Although that sounds like it describes a join perfectly, it’s not always the same thing. The following join, which is designed to find every film that has an actor, will return duplicates because some films have multiple actors:
We need to use
But what are we really trying to express with this query, and is it obvious from the SQL? The
Again, we benchmarked to see which strategy was faster. The results are shown in .
Table 4-2. EXISTS versus INNER JOIN
Result in queries per second (QPS)
In this example, the subquery performs much faster than the join.
We showed this lengthy example to illustrate two points: you should not heed categorical advice about subqueries, and you should use benchmarks to prove your assumptions about query plans and execution speed.
MySQL sometimes can’t “push down” conditions from the outside of a
mysql>1 to the inside, where they could be used to limit results or enable additional optimizations.
If you think any of the individual queries inside a
mysql>1 would benefit from a
Index merge optimizations
Index merge algorithms, introduced in MySQL 5.0, let MySQL use more than one index per table in a query. Earlier versions of MySQL could use only a single index, so when no single index was good enough to help with all the restrictions in the
In older MySQL versions, that query would produce a table scan unless you wrote it as the
In MySQL 5.0 and newer, however, the query can use both indexes, scanning them simultaneously and merging the results. There are three variations on the algorithm: union for
mysql>6 conditions, intersection for
mysql>7 conditions, and unions of intersections for combinations of the two. The following query uses a union of two index scans, as you can see by examining the
MySQL can use this technique on complex
If your queries run more slowly because of this optimizer limitation, you can work around it by disabling some indexes with
mysql>1, or just fall back to the old
Equality propagation can have unexpected costs sometimes. For example, consider a huge
The optimizer will “share” the list by copying it to the corresponding columns in all related tables. This is normally helpful, because it gives the query optimizer and execution engine more options for where to actually execute the
MySQL can’t execute a single query in parallel on many CPUs. This is a feature offered by some other database servers, but not MySQL. We mention it so that you won’t spend a lot of time trying to figure out how to get parallel query execution on MySQL!
MySQL can’t do true hash joins at the time of this writing—everything is a nested-loop join. However, you can emulate hash joins using hash indexes. If you aren’t using the Memory storage engine, you’ll have to emulate the hash indexes, too. We showed you how to do this in “Building your own hash indexes” on .
MySQL has historically been unable to do loose index scans, which scan noncontiguous ranges of an index. MySQL’s index scans generally require a defined start point and a defined end point in the index, even if only a few noncontiguous rows in the middle are really desired for the query. MySQL will scan the entire range of rows within these end points.
An example will help clarify this. Suppose we have a table with an index on columns
mysql>7 and we want to run the following query:
Because the index begins with column
mysql>8, but the query’s
mysql>8, MySQL will do a table scan and eliminate the nonmatching rows with a
Figure 4-5. MySQL scans the entire table to find rows
It’s easy to see that there’s a faster way to execute this query. The index’s structure (but not MySQL’s storage engine API) lets you seek to the beginning of each range of values, scan until the end of the range, and then backtrack and jump ahead to the start of the next range. shows what that strategy would look like if MySQL were able to do it.
Notice the absence of a
Figure 4-6. A loose index scan, which MySQL cannot currently do, would be more efficient
This is admittedly a simplistic example, and we could easily optimize the query we’ve shown by adding a different index. However, there are many cases where adding another index can’t solve the problem. One example is a query that has a range condition on the index’s first column and an equality condition on the second column.
Beginning in MySQL 5.0, loose index scans are possible in certain limited circumstances, such as queries that find maximum and minimum values in a grouped query:
The “Using index for group-by” information in this
Until MySQL supports general-purpose loose index scans, the workaround is to supply a constant or list of constants for the leading columns of the index. We showed several examples of how to get good performance with these types of queries in our indexing case study in the previous chapter.
MySQL doesn’t optimize certain
mysql>5 queries very well. Here’s an example:
Because there’s no index on
This general strategy often works well when MySQL would otherwise choose to scan more rows than necessary. If you’re a purist, you might object that this query is missing the point of SQL. We’re supposed to be able to tell the server what we want and it’s supposed to figure out how to get that data, whereas, in this case, we’re telling MySQL how to execute the query and, as a result, it’s not clear from the query that what we’re looking for is a minimal value. True, but sometimes you have to compromise your principles to get high performance.
SELECT and UPDATE on the same table
MySQL doesn’t let you
To work around this limitation, you can use a derived table, because MySQL materializes it as a temporary table. This effectively executes two queries: one
How to optimize count query in MySQL?
Optimize MySQL COUNT (*) query.
SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') OR field2 IN ('val3','val4'); ... .
ALTER TABLE table1 ADD INDEX `field1_field2_idx` (`field1`,`field2`); ... .
ALTER TABLE table1 ADD INDEX `field2_idx` (`field2`);.
How do you make a count query faster in SQL?
So to make SELECT COUNT(*) queries fast, here's what to do: Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.
Is MySQL count fast?
Counting rows with COUNT(*) The MyISAM engine maintains a count of all rows in a table, making counts in MySQL/MyISAM spectacularly fast. However, if you've done counts with InnoDB, another popular MySQL storage engine, or with a PostgreSQL table, then you know a count query takes much longer.
How do you optimize count (*)?
Optimize count(*) SQL Tips Answer: Without a where clause, a count(*) will always perform a full-table scan, and the only way to improve the performance of the count(*) is to use Oracle parallel query.