MySQL’s “everything is a nested-loop join” approach to query execution isn’t ideal for optimizing every kind of query. Fortunately, there are only a limited number of cases where the MySQL query optimizer does a poor job, and it’s usually possible to rewrite such queries more efficiently. Show MySQL sometimes optimizes subqueries very badly. The worst offenders are -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 subqueries in the SELECT * FROM sakila.film WHERE0 clause. As an example, let’s find all films in the Sakila sample database’s SELECT * FROM sakila.film WHERE1 table whose casts include the actress Penelope Guiness ( SELECT * FROM sakila.film WHERE2). This feels natural to write with a subquery, as follows: mysql> It’s tempting to think that MySQL will execute this query from the inside out, by finding a list of SELECT * FROM sakila.film WHERE3 values and substituting them into the -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 list. We said an -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 list is generally very fast, so you might expect the query to be optimized to something like this: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980); Unfortunately, exactly the opposite happens. MySQL tries to “help” the subquery by pushing a correlation into it from the outer table, which it thinks will let the subquery find rows more efficiently. It rewrites the query as follows: SELECT * FROM sakila.film WHERE Now the subquery requires the SELECT * FROM sakila.film WHERE6 from the outer SELECT * FROM sakila.film WHERE7 table and can’t be executed first. SELECT * FROM sakila.film WHERE8 shows the result as SELECT * FROM sakila.film WHERE9 (you can use mysql> 0 to see exactly how the query is rewritten):mysql> According to the SELECT * FROM sakila.film WHERE8 output, MySQL will table-scan the SELECT * FROM sakila.film WHERE7 table and execute the subquery for each row it finds. This won’t cause a noticeable performance hit on small tables, but if the outer table is very large, the performance will be extremely bad. Fortunately, it’s easy to rewrite such a query as a mysql> 3:mysql> Another good optimization is to manually generate the -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 list by executing the subquery as a separate query with mysql> 5. Sometimes this can be faster than a mysql> 3.MySQL has been criticized thoroughly for this particular type of subquery execution plan. Although it definitely needs to be fixed, the criticism often confuses two different issues: execution order and caching. Executing the query from the inside out is one way to optimize it; caching the inner query’s result is another. Rewriting the query yourself lets you take control over both aspects. Future versions of MySQL should be able to optimize this type of query much better, although this is no easy task. There are very bad worst cases for any execution plan, including the inside-out execution plan that some people think would be simple to optimize. When a correlated subquery is goodMySQL doesn’t always optimize correlated subqueries badly. If you hear advice to always avoid them, don’t listen! Instead, benchmark and make your own decision. Sometimes a correlated subquery is a perfectly reasonable, or even optimal, way to get a result. Let’s look at an example: mysql> The standard advice for this query is to write it as a mysql> 7 instead of using a subquery. In theory, MySQL’s execution plan will be essentially the same either way. Let’s see:mysql> The plans are nearly identical, but there are some differences:
So, in theory, MySQL will execute the queries almost identically. In reality, benchmarking is the only way to tell which approach is really faster. We benchmarked both queries on our standard setup. The results are shown in . Table 4-1. NOT EXISTS versus LEFT OUTER JOIN Query Result in queries per second (QPS) mysql>2 subquery 360 QPS mysql> 7425 QPS Our benchmark found that the subquery is quite a bit slower! However, this isn’t always the case. Sometimes a subquery can be faster. For example, it can work well when you just want to see rows from one table that match rows in another table. Although that sounds like it describes a join perfectly, it’s not always the same thing. The following join, which is designed to find every film that has an actor, will return duplicates because some films have multiple actors: mysql> We need to use mysql>4 or mysql>5 to eliminate the duplicates: mysql> But what are we really trying to express with this query, and is it obvious from the SQL? The mysql>6 operator expresses the logical concept of “has a match” without producing duplicated rows and avoids a mysql>5 or mysql>4 operation, which might require a temporary table. Here’s the query written as a subquery instead of a join: mysql> Again, we benchmarked to see which strategy was faster. The results are shown in . Table 4-2. EXISTS versus INNER JOIN Query Result in queries per second (QPS) mysql>9 185 QPS mysql>6 subquery 325 QPS In this example, the subquery performs much faster than the join. We showed this lengthy example to illustrate two points: you should not heed categorical advice about subqueries, and you should use benchmarks to prove your assumptions about query plans and execution speed. MySQL sometimes can’t “push down” conditions from the outside of a mysql>1 to the inside, where they could be used to limit results or enable additional optimizations. If you think any of the individual queries inside a mysql>1 would benefit from a mysql>3, or if you know they’ll be subject to an mysql>4 clause once combined with other queries, you need to put those clauses inside each part of the mysql>1. For example, if you mysql>1 together two huge tables and mysql>3 the result to the first 20 rows, MySQL will store both huge tables into a temporary table and then retrieve just 20 rows from it. You can avoid this by placing mysql>8 on each query inside the mysql>1. Index merge optimizationsIndex merge algorithms, introduced in MySQL 5.0, let MySQL use more than one index per table in a query. Earlier versions of MySQL could use only a single index, so when no single index was good enough to help with all the restrictions in the SELECT * FROM sakila.film WHERE0 clause, MySQL often chose a table scan. For example, the mysql> 9 table has an index on SELECT * FROM sakila.film WHERE6 and an index on SELECT * FROM sakila.film WHERE3, but neither is a good choice for both SELECT * FROM sakila.film WHERE0 conditions in this query: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);0 In older MySQL versions, that query would produce a table scan unless you wrote it as the mysql>1 of two queries: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);1 In MySQL 5.0 and newer, however, the query can use both indexes, scanning them simultaneously and merging the results. There are three variations on the algorithm: union for mysql>6 conditions, intersection for mysql>7 conditions, and unions of intersections for combinations of the two. The following query uses a union of two index scans, as you can see by examining the mysql>2 column: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);2 MySQL can use this technique on complex SELECT * FROM sakila.film WHERE0 clauses, so you may see nested operations in the mysql>2 column for some queries. This often works very well, but sometimes the algorithm’s buffering, sorting, and merging operations use lots of CPU and memory resources. This is especially true if not all of the indexes are very selective, so the parallel scans return lots of rows to the merge operation. Recall that the optimizer doesn’t account for this cost—it optimizes just the number of random page reads. This can make it “underprice” the query, which might in fact run more slowly than a plain table scan. The intensive memory and CPU usage also tends to impact concurrent queries, but you won’t see this effect when you run the query in isolation. This is another reason to design realistic benchmarks. If your queries run more slowly because of this optimizer limitation, you can work around it by disabling some indexes with mysql>1, or just fall back to the old mysql>1 tactic. Equality propagation can have unexpected costs sometimes. For example, consider a huge -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 list on a column the optimizer knows will be equal to some columns on other tables, due to a mysql>4, or mysql>4 clause that sets the columns equal to each other. The optimizer will “share” the list by copying it to the corresponding columns in all related tables. This is normally helpful, because it gives the query optimizer and execution engine more options for where to actually execute the -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);9 check. But when the list is very large, it can result in slower optimization and execution. There’s no built-in workaround for this problem at the time of this writing—you’ll have to change the source code if it’s a problem for you. (It’s not a problem for most people.) MySQL can’t execute a single query in parallel on many CPUs. This is a feature offered by some other database servers, but not MySQL. We mention it so that you won’t spend a lot of time trying to figure out how to get parallel query execution on MySQL! MySQL can’t do true hash joins at the time of this writing—everything is a nested-loop join. However, you can emulate hash joins using hash indexes. If you aren’t using the Memory storage engine, you’ll have to emulate the hash indexes, too. We showed you how to do this in “Building your own hash indexes” on . MySQL has historically been unable to do loose index scans, which scan noncontiguous ranges of an index. MySQL’s index scans generally require a defined start point and a defined end point in the index, even if only a few noncontiguous rows in the middle are really desired for the query. MySQL will scan the entire range of rows within these end points. An example will help clarify this. Suppose we have a table with an index on columns mysql>7 and we want to run the following query: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);3 Because the index begins with column mysql>8, but the query’s SELECT * FROM sakila.film WHERE0 clause doesn’t specify column mysql>8, MySQL will do a table scan and eliminate the nonmatching rows with a SELECT * FROM sakila.film WHERE0 clause, as shown in . Figure 4-5. MySQL scans the entire table to find rows It’s easy to see that there’s a faster way to execute this query. The index’s structure (but not MySQL’s storage engine API) lets you seek to the beginning of each range of values, scan until the end of the range, and then backtrack and jump ahead to the start of the next range. shows what that strategy would look like if MySQL were able to do it. Notice the absence of a SELECT * FROM sakila.film WHERE0 clause, which isn’t needed because the index alone lets us skip over the unwanted rows. (Again, MySQL can’t do this yet.) Figure 4-6. A loose index scan, which MySQL cannot currently do, would be more efficient This is admittedly a simplistic example, and we could easily optimize the query we’ve shown by adding a different index. However, there are many cases where adding another index can’t solve the problem. One example is a query that has a range condition on the index’s first column and an equality condition on the second column. Beginning in MySQL 5.0, loose index scans are possible in certain limited circumstances, such as queries that find maximum and minimum values in a grouped query: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);4 The “Using index for group-by” information in this SELECT * FROM sakila.film WHERE8 plan indicates a loose index scan. This is a good optimization for this special purpose, but it is not a general-purpose loose index scan. It might be better termed a “loose index probe.” Until MySQL supports general-purpose loose index scans, the workaround is to supply a constant or list of constants for the leading columns of the index. We showed several examples of how to get good performance with these types of queries in our indexing case study in the previous chapter. MySQL doesn’t optimize certain mysql>4 and mysql>5 queries very well. Here’s an example: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);5 Because there’s no index on mysql>6, this query performs a table scan. If MySQL scans the primary key, it can theoretically stop after reading the first matching row, because the primary key is strictly ascending and any subsequent row will have a greater SELECT * FROM sakila.film WHERE3. However, in this case, MySQL will scan the whole table, which you can verify by profiling the query. The workaround is to remove the mysql>4 and rewrite the query with a mysql>3, as follows: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);6 This general strategy often works well when MySQL would otherwise choose to scan more rows than necessary. If you’re a purist, you might object that this query is missing the point of SQL. We’re supposed to be able to tell the server what we want and it’s supposed to figure out how to get that data, whereas, in this case, we’re telling MySQL how to execute the query and, as a result, it’s not clear from the query that what we’re looking for is a minimal value. True, but sometimes you have to compromise your principles to get high performance. SELECT and UPDATE on the same tableMySQL doesn’t let you mysql> 8 from a table while simultaneously running an -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);01 on it. This isn’t really an optimizer limitation, but knowing how MySQL executes queries can help you work around it. Here’s an example of a query that’s disallowed, even though it is standard SQL. The query updates each row with the number of similar rows in the table: -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);7 To work around this limitation, you can use a derived table, because MySQL materializes it as a temporary table. This effectively executes two queries: one mysql> 8 inside the subquery, and one multitable -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);01 with the joined results of the table and the subquery. The subquery opens and closes the table before the outer -- SELECT GROUP_CONCAT(film_id) FROM sakila.film_actor WHERE actor_id = 1; -- Result: 1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980 SELECT * FROM sakila.film WHERE film_id IN(1,23,25,106,140,166,277,361,438,499,506,509,605,635,749,832,939,970,980);01 opens the table, so the query will now succeed: How to optimize count query in MySQL?Optimize MySQL COUNT (*) query. SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') OR field2 IN ('val3','val4'); ... . ALTER TABLE table1 ADD INDEX `field1_field2_idx` (`field1`,`field2`); ... . ALTER TABLE table1 ADD INDEX `field2_idx` (`field2`);. How do you make a count query faster in SQL?So to make SELECT COUNT(*) queries fast, here's what to do:
Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.
Is MySQL count fast?Counting rows with COUNT(*)
The MyISAM engine maintains a count of all rows in a table, making counts in MySQL/MyISAM spectacularly fast. However, if you've done counts with InnoDB, another popular MySQL storage engine, or with a PostgreSQL table, then you know a count query takes much longer.
How do you optimize count (*)?Optimize count(*) SQL Tips
Answer: Without a where clause, a count(*) will always perform a full-table scan, and the only way to improve the performance of the count(*) is to use Oracle parallel query.
|