Group by句を使用したSubstr（）

Question

私は示されている問題の解決に取り組んでいますここ、そして質問が生じました

作成したスキーマ、問題の説明、およびテストデータは次のとおりです。

スキーマ：学生

CREATE TABLE Students (group_id text, sql_quotient float); INSERT INTO Students(group_id, sql_quotient) VALUES ( 'A', 25 ), ( 'B', 30 ), ( 'C', 40 ), ( 'A', 35 ), ( 'B', 20 );

タスク：すべてのグループの最大平均sql_quotientを表示します。

group_idは、A〜Zの範囲の単一文字であることが保証されています。
ここでグループAの場合、平均は30です。 Bの場合、平均は25です。 Cの場合、平均は40です。したがって、40が表示されます。

次の2つのクエリを試しました。どちらも正しい答えを教えてくれます。

クエリ1

select max(round(b.avg_quotient,2)) as answer from (SELECT AVG(sql_quotient) as avg_quotient FROM Students GROUP BY group_id) as b;

実行時間= 0.002378秒

クエリ2

select max(round(b.avg_quotient,2)) as answer from (SELECT AVG(sql_quotient) as avg_quotient FROM Students GROUP BY substr(group_id,1,1) )as b;

実行時間= 0.000459秒

違い-最初のクエリはgroup_idでデータをグループ化します。 2番目は `substr（group_id、1,1）です。

2番目のクエリは追加の関数を適用するので、時間がかかると思います。ただし、上記のように、クエリno.2の実行時間はクエリno。よりも著しく少ないです。 1。

私の質問：クエリ2に追加の関数（substr（））が1つあるにもかかわらず、クエリ2の実行時間がクエリ1よりも低いのはなぜですか。

ノート：

スキーマはすでに定義されています。 Idのデータ型がchar（1）ではなくtextである理由がわかりません。私の質問では、それは無関係です。
同じ問題の別のクエリではなく、これら2つのクエリの実行時間の違いの理由を探しています。
Studentテーブルは実行ごとに新しく作成されるため、最初の実行時にディスクからデータを読み取り、2番目の実行時にメモリに格納する必要はありません。これを証明するために、クエリをさらに4回実行し、クエリ2、クエリ1、2、2、1の順に実行しました。実行時間は：
- Q2、1日：0.000493
- 第1四半期：0.002779
- 2番目、2番目：0.000499
- Q1、2nd：0.002787

Evan Carroll · Answer

部分文字列は必要ありません。 max()を計算する必要はありません

このようなもの、

SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1; +------+------------+ | id | avg(price) | +------+------------+ | C | 40.0000 | +------+------------+

これは、最大値を計算する必要があるよりもおそらく簡単です。

SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id ) AS t );

タイミング

+----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | 0.00008716 | SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id) AS t ) | | 2 | 0.00008053 | SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1 | | 3 | 0.00011303 | SELECT id, avg(price) FROM Students GROUP BY id HAVING avg(price) = ( SELECT max(avg) FROM ( SELECT id, avg(price) AS avg FROM Students GROUP BY id) AS t ) | | 4 | 0.00006121 | SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1 | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------+

Evan Carroll · Answer

寒い時期も暑い時期もご存じないと思います。同じクエリをもう一度実行します。

質問の根拠全体が間違っていると思います。速くはありません。最初のテーブルルックアップは遅く、その後キャッシュされます。

MariaDB [test]> show profiles; +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | 0.00366737 | select max(round(b.avg_price,2)) as answer from (SELECT AVG(price) as avg_price FROM Students GROUP BY id )as b | | 2 | 0.00080117 | select max(round(b.avg_price,2)) as answer from (SELECT AVG(price) as avg_price FROM Students GROUP BY substr(id,1,1) )as b | | 3 | 0.00010088 | select max(round(b.avg_price,2)) as answer from (SELECT AVG(price) as avg_price FROM Students GROUP BY id )as b | | 4 | 0.00015381 | select max(round(b.avg_price,2)) as answer from (SELECT AVG(price) as avg_price FROM Students GROUP BY substr(id,1,1) )as b | +----------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+

MySQLは悪臭があるため、ヒープヒットとHDヒットを比較することはできません。しかし実際のデータベースでは

explain (ANALYZE, VERBOSE, BUFFERS) SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1;; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Limit (cost=31.50..31.50 rows=1 width=40) (actual time=0.091..0.091 rows=1 loops=1) Output: id, (avg(price)) Buffers: shared hit=3 read=1 -> Sort (cost=31.50..32.00 rows=200 width=40) (actual time=0.090..0.090 rows=1 loops=1) Output: id, (avg(price)) Sort Key: (avg(students.price)) DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=3 read=1 -> HashAggregate (cost=28.00..30.50 rows=200 width=40) (actual time=0.045..0.046 rows=3 loops=1) Output: id, avg(price) Group Key: students.id Buffers: shared read=1 -> Seq Scan on public.students (cost=0.00..22.00 rows=1200 width=40) (actual time=0.027..0.028 rows=5 loops=1) Output: id, price Buffers: shared read=1 Planning time: 0.541 ms Execution time: 0.332 ms (17 rows) test=# explain (ANALYZE, VERBOSE, BUFFERS) SELECT id, avg(price) FROM Students GROUP BY id ORDER BY avg(price) DESC LIMIT 1;; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Limit (cost=31.50..31.50 rows=1 width=40) (actual time=0.059..0.059 rows=1 loops=1) Output: id, (avg(price)) Buffers: shared hit=1 -> Sort (cost=31.50..32.00 rows=200 width=40) (actual time=0.057..0.057 rows=1 loops=1) Output: id, (avg(price)) Sort Key: (avg(students.price)) DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=1 -> HashAggregate (cost=28.00..30.50 rows=200 width=40) (actual time=0.037..0.039 rows=3 loops=1) Output: id, avg(price) Group Key: students.id Buffers: shared hit=1 -> Seq Scan on public.students (cost=0.00..22.00 rows=1200 width=40) (actual time=0.013..0.015 rows=5 loops=1) Output: id, price Buffers: shared hit=1 Planning time: 0.114 ms Execution time: 0.152 ms (17 rows)

PostgreSQLで、観察してください

 Buffers: shared hit=3 read=1

対

 Buffers: shared hit=1

最初の実行ではテーブルを読み取る必要があり、ramには3つの共有ヒットがあります。 2回目は、すでにramにあり、すべてのヒットが1ヒットです。