MySQLサブクエリが大幅にスローダウンしますが、独立して正常に動作します

Question

クエリ1：

select distinct email from mybigtable where account_id=345

0.1秒かかります

クエリ2：

Select count(*) as total from mybigtable where account_id=123 and email IN (<include all from above result>)

0.2秒かかります

クエリ3：

Select count(*) as total from mybigtable where account_id=123 and email IN (select distinct email from mybigtable where account_id=345)

22分かかり、90％が「準備中」の状態です。なぜこんなに時間がかかるのですか？.

テーブルはMySQL 5.0で3.2mil行のinnodbです

RolandoMySQLDBA · Accepted Answer

クエリ3では、基本的にそれ自体に対してmybigtableのすべての行に対してサブクエリを実行しています。

これを回避するには、2つの大きな変更を行う必要があります。

主要な変更＃1：クエリのリファクタリング

これが元のクエリです

Select count(*) as total from mybigtable where account_id=123 and email IN (select distinct email from mybigtable where account_id=345)

あなたは試すことができます

select count(*) EmailCount from ( select tbl123.email from (select email from mybigtable where account_id=123) tbl123 INNER JOIN (select distinct email from mybigtable where account_id=345) tbl345 using (email) ) A;

またはメールあたりの数

select email,count(*) EmailCount from ( select tbl123.email from (select email from mybigtable where account_id=123) tbl123 INNER JOIN (select distinct email from mybigtable where account_id=345) tbl345 using (email) ) A group by email;

主な変更点2：適切なインデックス作成

クエリ1とクエリ2は高速に実行されているので、これは既にあると思います。（account_id、email）に複合インデックスがあることを確認してください。 SHOW CREATE TABLE mybigtable\Gを実行して、あることを確認してください。インデックスがない場合、または不明な場合は、とにかくインデックスを作成します。

ALTER TABLE mybigtable ADD INDEX account_id_email_ndx (account_id,email);

UPDATE 2012-03-07 13:26 EST

NOT IN（）を実行する場合は、INNER JOINをLEFT JOINに変更し、次のように右側がNULLであることを確認します。

select count(*) EmailCount from ( select tbl123.email from (select email from mybigtable where account_id=123) tbl123 LEFT JOIN (select distinct email from mybigtable where account_id=345) tbl345 using (email) WHERE tbl345.email IS NULL ) A;

UPDATE 2012-03-07 14:13 EST

JOINの実行に関するこれら2つのリンクをお読みください

ここに私がクエリをリファクタリングすることを学んだ素晴らしいYouTubeビデオとそれが基づいた本があります

Aaron Brown · Answer

MySQLでは、IN句内の副選択が外部クエリのすべての行に対して再実行されるため、O（n ^ 2）が作成されます。要するに、IN（SELECT）は使用しないでください。

Stephen Senkomago Musoke · Answer

Account_idにインデックスがありますか？
2番目の問題は、5.0ではひどいパフォーマンスを持つネストされたサブクエリにある可能性があります。
HAVING句を使用したGROUP BYは、DISTINCTよりも高速です。
アイテム＃3に加えて、結合を介してより適切に実行できることを何をしようとしていますか？

Derek Downey · Answer

あなたのようなIN（）サブクエリを処理する場合、多くの処理が必要になります。あなたはそれについてもっと読むことができますここ。

私の最初の提案は、代わりにサブクエリをJOINに書き直すことです。（テストされていない）のようなもの：

SELECT COUNT(*) AS total FROM mybigtable AS t1 INNER JOIN (SELECT DISTINCT email FROM mybigtable WHERE account_id=345) AS t2 ON t2.email=t1.email WHERE account_id=123