PostgreSQLプランナーが少数の結果行に対してbtreeまたはGistインデックスを選択

Question

両側で同じ値を持つ範囲の検索で、予想した複合インデックスではないインデックスを持つプランを使用しているシナリオがあります。いくつかの作業の後、さまざまな計画を示すサンプルデータを生成することができました。最初のクエリはidx_hourts_btreeインデックスを使用していますが、2番目のクエリは別の複合インデックスを使用しています。実際のシナリオでは、「反自然な」Gistコンポジットを使用するプランは非常に遅いクエリを生成します。

セットアップ：

create table sampledata as select (row_number() over ())::int , extract(hour from generate_series)::int as hour , extract(minute from generate_series)::int as minute , extract(second from generate_series)::int as second , generate_series as ts from generate_series (timestamptz '2004-03-08 18:29:00' , timestamptz '2004-03-08 18:31:00' , interval '1 millisecond'); create index idx_HourTs_btree on sampledata using btree(hour,ts); create index idx_idts_Gist on sampledata using Gist(row_number,minute,second,ts); analyze sampledata; # \d sampledata Table "public.sampledata" Column | Type | Collation | Nullable | Default ------------+--------------------------+-----------+----------+--- ------ row_number | integer | | | hour | integer | | | minute | integer | | | second | integer | | | ts | timestamp with time zone | | | Indexes: "idx_hourts_btree" btree (hour, ts) "idx_idts_Gist" Gist (row_number, minute, second, ts)

クエリ：

explain analyze select * from sampledata where hour = 18 and ts between '2004-03-08 18:30:00.991' and '2004-03-08 18:30:00.993'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_hourts_btree on sampledata (cost=0.57..8.59 rows=1 width=16) (actual time=3.077..3.079 rows=3 loops=1) Index Cond: ((hour = 18) AND (ts >= '2004-03-08 18:30:00.991-03'::timestamp with time zone) AND (ts <= '2004-03-08 18:30:00.993-03'::timestamp with time zone)) Planning time: 0.114 ms Execution time: 3.101 ms (4 rows) Time: 4.090 ms explain analyze select * from sampledata where hour = 18 and ts between '2004-03-08 18:30:00.993' and '2004-03-08 18:30:00.993'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_idts_Gist on sampledata (cost=0.55..8.57 rows=1 width=16) (actual time=5.985..5.988 rows=1 loops=1) Index Cond: ((ts >= '2004-03-08 18:30:00.993-03'::timestamp with time zone) AND (ts <= '2004-03-08 18:30:00.993-03'::timestamp with time zone)) Filter: (hour = 18) Planning time: 0.153 ms Execution time: 6.030 ms (5 rows) Time: 7.318 ms

PGサーバー：9.5.9。

なぜ計画が異なるのですか？

ここでは、難読化された実際のテーブルの分析を行います。 "quebec"はGist複合インデックス（別の列、マイク）、 "uniform"はbtreeインデックス（ホテル、マイク）で、次の値は同じです（bad exec time）： https：//explain.depesz .com/s/O5Gf

（実行時間の違い）が少し異なる場合： https://explain.depesz.com/s/67yL

質問で提供したテーブルサンプルデータを使用して、問題をシミュレートすることができます。

2018-03-14の更新

@アーウィンの答えは、サンプルデータを実際のケースのようなものに改善するためのより多くのアイデアを私に与えました。前のサンプルでは、実世界には存在しない冗長な列を含む実データをシミュレートしようとしていました。別の使用例ではGist複合インデックスが必要なので、より大きな設計変更を行う前に、もっと簡単なものを忘れていないかどうかを確認したいと思います。新しいサンプルは、実行時間と計画時間の実際の違いを示しているはずです。列のstatisticsターゲットを増やすことも試みましたが、成功しませんでした。

セットアップスクリプト：

show default_statistics_target; show random_page_cost; drop table if exists sampledata2; create table sampledata2 as (with a as (select generate_series(1,50) as id) select id, md5(random()::text) Rand, generate_series (timestamptz '2004-03-07', timestamptz '2004-03-17', interval '1 minute') ts from a); select * from sampledata2 limit 3; create index idx_idTs_btree on sampledata2 using btree(id, ts); create index idx_randTs_Gist on sampledata2 using Gist(Rand, ts); analyze sampledata2; explain analyze select * from sampledata2 where id=42 and ts between '2004-03-07 00:22:00-03' and '2004-03-07 00:22:00-03'; explain analyze select * from sampledata2 where id=42 and ts between '2004-03-07 00:22:00-03' and '2004-03-07 00:23:00-03'; alter table sampledata2 alter column id set statistics 10000; alter table sampledata2 alter column Rand set statistics 10000; alter table sampledata2 alter column ts set statistics 10000; analyze sampledata2; explain analyze select * from sampledata2 where id=42 and ts between '2004-03-07 00:22:00-03' and '2004-03-07 00:22:00-03'; explain analyze select * from sampledata2 where id=42 and ts between '2004-03-07 00:22:00-03' and '2004-03-07 00:23:00-03';

出力：

 default_statistics_target --------------------------- 50 (1 row) Time: 0.640 ms random_page_cost ------------------ 4 (1 row) Time: 0.257 ms DROP TABLE Time: 35.850 ms SELECT 720050 Time: 1438.842 ms (00:01.439) id | Rand | ts ----+----------------------------------+------------------------ 1 | 8e1d3920ef44f94e71291b2371178ece | 2004-03-07 00:00:00-03 1 | 664fcfc94e09ea0ff050b934e6cb486f | 2004-03-07 00:01:00-03 1 | ac52031c8d98df67e2aacaf7d10b3af7 | 2004-03-07 00:02:00-03 (3 rows) Time: 0.651 ms CREATE INDEX Time: 356.923 ms CREATE INDEX Time: 35661.019 ms (00:35.661) ANALYZE Time: 86.580 ms QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_randts_Gist on sampledata2 (cost=0.41..8.43 rows=1 width=45) (actual time=10.851..18.430 rows=1 loops=1) Index Cond: ((ts >= '2004-03-07 00:22:00-03'::timestamp with time zone) AND (ts <= '2004-03-07 00:22:00-03'::timestamp with time zone)) Filter: (id = 42) Rows Removed by Filter: 49 Planning time: 0.224 ms Execution time: 18.479 ms (6 rows) Time: 19.422 ms QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_idts_btree on sampledata2 (cost=0.42..8.45 rows=1 width=45) (actual time=0.040..0.041 rows=2 loops=1) Index Cond: ((id = 42) AND (ts >= '2004-03-07 00:22:00-03'::timestamp with time zone) AND (ts <= '2004-03-07 00:23:00-03'::timestamp with time zone)) Planning time: 0.144 ms Execution time: 0.067 ms (4 rows) Time: 0.803 ms ALTER TABLE Time: 1.220 ms ALTER TABLE Time: 0.924 ms ALTER TABLE Time: 0.894 ms ANALYZE Time: 2675.784 ms (00:02.676) QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_randts_Gist on sampledata2 (cost=0.41..8.43 rows=1 width=45) (actual time=6.472..11.493 rows=1 loops=1) Index Cond: ((ts >= '2004-03-07 00:22:00-03'::timestamp with time zone) AND (ts <= '2004-03-07 00:22:00-03'::timestamp with time zone)) Filter: (id = 42) Rows Removed by Filter: 49 Planning time: 0.757 ms Execution time: 11.524 ms (6 rows) Time: 12.948 ms QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_idts_btree on sampledata2 (cost=0.42..8.45 rows=1 width=45) (actual time=0.021..0.022 rows=2 loops=1) Index Cond: ((id = 42) AND (ts >= '2004-03-07 00:22:00-03'::timestamp with time zone) AND (ts <= '2004-03-07 00:23:00-03'::timestamp with time zone)) Planning time: 0.505 ms Execution time: 0.045 ms (4 rows) Time: 1.170 ms

2018-03-15の更新

バグが報告されました。現時点では、v11でこの問題に関連する改善が行われます。 https://www.postgresql.org/message-id/31902.1521064417%40sss.pgh.pa.us

Gerard H. Pille · Answer

PostgreSQLは2ミリ秒の差がある分（意図的なしゃれはありません）、PostgreSQLはGistインデックスを破棄し、パフォーマンスを最適に保ちます。

QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_hourts_btree on sampledata (cost=0.57..8.59 rows=1 width=16) (actual time=0.023..0.025 rows=3 loops=1) Index Cond: ((hour = 18) AND (ts >= '2004-03-08 18:30:00.999+01'::timestamp with time zone) AND (ts <= '2004-03-08 18:30:01.001+01'::timestamp with time zone)) Planning time: 0.101 ms Execution time: 0.044 ms (4 rows)

その間のわずか1ミリ秒：

QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_idts_Gist on sampledata (cost=0.55..8.57 rows=1 width=16) (actual time=0.057..0.058 rows=2 loops=1) Index Cond: ((ts >= '2004-03-08 18:30:00.999+01'::timestamp with time zone) AND (ts <= '2004-03-08 18:30:01+01'::timestamp with time zone)) Filter: (hour = 18) Planning time: 0.103 ms Execution time: 0.085 ms (5 rows)

X86_64-pc-linux-gnu上のPostgreSQL 9.6.6、gccでコンパイル（Debian 6.3.0-18）6.3.0 20170516、64ビット