postgresでウィンドウクエリを最適化する方法

Question

約175,000レコードの次の表があります。

 Column | Type | Modifiers ----------------+-----------------------------+------------------------------------- id | uuid | not null default uuid_generate_v4() competition_id | uuid | not null user_id | uuid | not null first_name | character varying(255) | not null last_name | character varying(255) | not null image | character varying(255) | country | character varying(255) | slug | character varying(255) | not null total_votes | integer | not null default 0 created_at | timestamp without time zone | updated_at | timestamp without time zone | featured_until | timestamp without time zone | image_src | character varying(255) | hidden | boolean | not null default false photos_count | integer | not null default 0 photo_id | uuid | Indexes: "entries_pkey" PRIMARY KEY, btree (id) "index_entries_on_competition_id" btree (competition_id) "index_entries_on_featured_until" btree (featured_until) "index_entries_on_hidden" btree (hidden) "index_entries_on_photo_id" btree (photo_id) "index_entries_on_slug" btree (slug) "index_entries_on_total_votes" btree (total_votes) "index_entries_on_user_id" btree (user_id)

また、次のクエリを実行して、エントリのランクと次および前のエントリのスラッグを取得しています。

WITH entry_with_global_rank AS ( SELECT id , rank() OVER w AS global_rank , LAG(slug) OVER w AS previous_slug , LEAD(slug) OVER w AS next_slug FROM entries WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' WINDOW w AS (PARTITION BY competition_id ORDER BY total_votes DESC) ) SELECT * FROM entry_with_global_rank WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' LIMIT 1;

EXPLAINの結果は次のとおりです：

 QUERY PLAN ----------------------------------------------------------------------------------------------- Limit (cost=516228.88..516233.37 rows=1 width=88) CTE entry_with_global_rank -> WindowAgg (cost=510596.59..516228.88 rows=250324 width=52) -> Sort (cost=510596.59..511222.40 rows=250324 width=52) Sort Key: entries.total_votes -> Seq Scan on entries (cost=0.00..488150.74 rows=250324 width=52) Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) -> CTE Scan on entry_with_global_rank (cost=0.00..5632.29 rows=1252 width=88) Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid) (9 rows)

このクエリの所要時間は約1400ミリ秒です。これをスピードアップする方法はありますか？

編集：

EXPLAIN ANALYZEの結果は次のとおりです：

 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=516228.88..516233.37 rows=1 width=88) (actual time=1232.824..1232.824 rows=1 loops=1) CTE entry_with_global_rank -> WindowAgg (cost=510596.59..516228.88 rows=250324 width=52) (actual time=1202.101..1226.846 rows=8727 loops=1) -> Sort (cost=510596.59..511222.40 rows=250324 width=52) (actual time=1202.069..1213.992 rows=8728 loops=1) Sort Key: entries.total_votes Sort Method: quicksort Memory: 8128kB -> Seq Scan on entries (cost=0.00..488150.74 rows=250324 width=52) (actual time=89.970..1174.083 rows=50335 loops=1) Filter: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) Rows Removed by Filter: 125477 -> CTE Scan on entry_with_global_rank (cost=0.00..5632.29 rows=1252 width=88) (actual time=1232.822..1232.822 rows=1 loops=1) Filter: (id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid) Rows Removed by Filter: 8726 Total runtime: 1234.424 ms (13 rows)

編集2：

データベースでVACUUM ANALYZEを実行したところ、クエリ時間が改善されましたが、パフォーマンスを向上させる方法がいくつかあるはずです。

 QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=475372.26..475376.76 rows=1 width=88) (actual time=138.388..138.388 rows=1 loops=1) CTE entry_with_global_rank -> WindowAgg (cost=470662.23..475372.26 rows=209335 width=35) (actual time=125.489..132.214 rows=4178 loops=1) -> Sort (cost=470662.23..471185.56 rows=209335 width=35) (actual time=125.462..126.724 rows=4179 loops=1) Sort Key: entries.total_votes Sort Method: quicksort Memory: 5510kB -> Bitmap Heap Scan on entries (cost=71390.90..452161.77 rows=209335 width=35) (actual time=29.381..87.130 rows=50390 loops=1) Recheck Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) -> Bitmap Index Scan on index_entries_on_competition_id (cost=0.00..71338.56 rows=209335 width=0) (actual time=23.593..23.593 rows=51257 loops=1) Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) -> CTE Scan on entry_with_global_rank (cost=0.00..4710.04 rows=1047 width=88) (actual time=138.387..138.387 rows=1 loops=1) Filter: (id = '9470ec4f-fed1-4f95-bbed-1e3dbba5f53b'::uuid) Rows Removed by Filter: 4177 Total runtime: 138.588 ms (14 rows)

編集3：

要求に応じて、VACUUM ANALYZEの直後にカバリングインデックスを配置した最終クエリプラン：

 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.42..6771.99 rows=1 width=88) (actual time=46.765..46.765 rows=1 loops=1) -> Subquery Scan on entry_with_global_rank (cost=0.42..6771.99 rows=1 width=88) (actual time=46.763..46.763 rows=1 loops=1) Filter: (entry_with_global_rank.id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0'::uuid) Rows Removed by Filter: 9128 -> WindowAgg (cost=0.42..5635.06 rows=90955 width=35) (actual time=0.090..40.002 rows=9129 loops=1) -> Index Only Scan using entries_extra_special_idx on entries (cost=0.42..3815.96 rows=90955 width=35) (actual time=0.071..10.973 rows=9130 loops=1) Index Cond: (competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b'::uuid) Heap Fetches: 166 Total runtime: 46.867 ms (9 rows)

Erwin Brandstetter · Accepted Answer

[〜＃〜] cte [〜＃〜] はここでは不要であり、最適化バリアとして機能します。単純なサブクエリは通常、パフォーマンスが向上します。

_SELECT * FROM ( SELECT id ,rank() OVER w AS global_rank ,lag(slug) OVER w AS previous_slug ,lead(slug) OVER w AS next_slug FROM entries WHERE competition_id = 'bdd94eee-25a4-481f-b7b5-37aaed953c6b' WINDOW w AS (ORDER BY total_votes DESC) ) entry_with_global_rank WHERE id = 'f2df68b7-d720-459d-8c4d-d11e28e0f0c0' LIMIT 1; _

@ Danielがコメントとして、とにかく単一の_PARTITION BY_に制限しているため、ウィンドウ定義から_competition_id_句を削除しました。

テーブルレイアウト

テーブルレイアウトを最適化して、ディスク上のストレージサイズをわずかに減らすことができます。これにより、すべてが少し高速になりますが、

_ Column | Type | Modifiers ----------------+-----------------------------+------------------------------------- id | uuid | not null default uuid_generate_v4() competition_id | uuid | not null user_id | uuid | not null total_votes | integer | not null default 0 photos_count | integer | not null default 0 hidden | boolean | not null default false slug | character varying(255) | not null first_name | character varying(255) | not null last_name | character varying(255) | not null image | character varying(255) | country | character varying(255) | image_src | character varying(255) | photo_id | uuid | created_at | timestamp without time zone | updated_at | timestamp without time zone | featured_until | timestamp without time zone | _

詳細：

読み取りパフォーマンスのためのPostgreSQLの構成

また、実際にそれらすべてのuuid列が必要ですか？ intまたはbigintは機能しませんか？テーブルとインデックスを少し小さくし、すべてを高速化します。

文字データにはtextを使用するだけですが、クエリのパフォーマンスには役立ちません。

余談：character varying(255)はPostgresではほとんどの場合無意味です。他のいくつかのRDBMSは長さの制限から利益を得ますが、Postgresの場合はすべて同じです（実際にはあり得ない最大長の255文字を強制する必要がある場合を除きます）。

特別インデックス

最後に、高度に専門化されたインデックスを構築できます（インデックスのメンテナンスが特別なケースに値する場合のみ）：

_CREATE INDEX entries_special_idx ON entries (competition_id, total_votes DESC, id, slug); _

インデックスに_(id, slug)_を追加することは、これから index-only scans を取得できる場合にのみ意味があります。（自動バキュームを無効にしたり、多数の同時書き込みを行ったりすると、その作業は無効になります。）そうでない場合は、最後の2つの列を削除します。

そこにいる間、インデックスを監査します。それらはすべて使用中ですか？ここにいくつかの死んだ貨物があるかもしれません。