ビットマップヒープスキャンが遅い、実行時間を殺すことで順序付け

Question

テーブルユーザーの速度が低下するクエリ、テーブルには500万のレコードがあり、クエリパフォーマンスを向上させるために、集約されたデータの関数出力にインデックスを付けることがよくあります。関数出力are not jsonbs。

Postgresバージョン10.1

スキーマ：

_CREATE TABLE users ( id SERIAL PRIMARY KEY NOT NULL, social jsonb, flags text[], full_name text, email text, location jsonb, contact_info jsonb, created_at TIMESTAMP_WITHOUT TIME ZONE ); CREATE INDEX available_channels_idx ON public.users USING gin (public.available_channels(social, contact_info)); CREATE INDEX mixed_frequent_locations_idx ON public.users USING gin (public.mixed_frequent_locations(location)); CREATE INDEX idx_in_social_follower_count ON public.users USING btree (public.social_follower_count(social) DESC NULLS LAST); CREATE INDEX created_at_idx ON public.users USING btree (created_at); CREATE INDEX idx_in_social_follower_count_and_created_at ON public.users USING btree (public.social_follower_count(social) DESC, created_at); _

スロークエリ（ここで、ビットマップヒープスキャンは、インデックスの再チェックによって多すぎる行を削除します）：

_EXPLAIN ANALYZE SELECT * FROM users WHERE (social_follower_count(social) > '30000') AND (engagement_level(social) <= 3) AND (available_channels(social, contact_info) <@ array['yt']) AND (has_emails(contact_info) = TRUE) AND (not_business(social) = TRUE) AND (array['United States'] <@ mixed_frequent_locations(location)) AND (is_visible(social, flags) = TRUE) ORDER BY social_follower_count(social) DESC, "users"."created_at" ASC LIMIT 12 OFFSET 0; _

このクエリは、制限を適用せずに11616件の結果を返します。クエリの変数入力はarray ['United States']、has_emails（contact_info）= TRUE、array ['yt']、social_follower_count（social）> '30000'です。

クエリプラン：

_Limit (cost=59629.20..59629.23 rows=12 width=1531) (actual time=330055.413..330055.418 rows=12 loops=1) -> Sort (cost=59629.20..59629.69 rows=199 width=1531) (actual time=330055.411..330055.412 rows=12 loops=1) Sort Key: (social_follower_count(social)) DESC, created_at Sort Method: top-N heapsort Memory: 65kB -> Bitmap Heap Scan on users (cost=24767.69..59624.63 rows=199 width=1531) (actual time=551.864..330000.716 rows=11616 loops=1) Recheck Cond: ((available_channels(social, contact_info) <@ '{yt}'::text[]) AND ('{"United States"}'::text[] <@ mixed_frequent_locations(location)) AND (social_follower_count(social) > 30000)) Rows Removed by Index Recheck: 883451 Filter: (has_emails(contact_info) AND not_business(social) AND is_visible(social, flags) AND (engagement_level(social) <= 3)) Rows Removed by Filter: 9775 Heap Blocks: exact=17001 lossy=132075 -> BitmapAnd (cost=24767.69..24767.69 rows=6344 width=0) (actual time=422.660..422.660 rows=0 loops=1) -> Bitmap Index Scan on available_channels_idx (cost=0.00..4046.06 rows=363475 width=0) (actual time=116.792..116.792 rows=442083 loops=1) Index Cond: (available_channels(social, contact_info) <@ '{yt}'::text[]) -> Bitmap Index Scan on mixed_frequent_locations_idx (cost=0.00..5550.85 rows=617447 width=0) (actual time=143.090..143.090 rows=620980 loops=1) Index Cond: ('{"United States"}'::text[] <@ mixed_frequent_locations(location)) -> Bitmap Index Scan on idx_in_social_follower_count (cost=0.00..15170.13 rows=821559 width=0) (actual time=132.214..132.215 rows=834091 loops=1) Index Cond: (social_follower_count(social) > 30000) Planning time: 0.534 ms Execution time: 393793.472 ms _

サーバーの仕様： 63 GBのRAM、Intel Core i7-6700K、250 GBのSSD

Work_memを増やすと、損失の多いブロックは減少しましたが、実際の再チェックは減少しませんでした。

_ Limit (cost=59629.20..59629.23 rows=12 width=1531) (actual time=42330.685..42330.691 rows=12 loops=1) -> Sort (cost=59629.20..59629.69 rows=199 width=1531) (actual time=42330.662..42330.665 rows=12 loops=1) Sort Key: (social_follower_count(social)) DESC, created_at Sort Method: top-N heapsort Memory: 65kB -> Bitmap Heap Scan on users (cost=24767.69..59624.63 rows=199 width=1531) (actual time=846.650..42281.071 rows=11616 loops=1) Recheck Cond: ((available_channels(social, contact_info) <@ '{yt}'::text[]) AND ('{"United States"}'::text[] <@ mixed_frequent_locations(location)) AND (social_follower_count(social) > 30000)) Rows Removed by Index Recheck: 7149 Filter: (has_emails(contact_info) AND not_business(social) AND is_visible(social, flags) AND (engagement_level(social) <= 3)) Rows Removed by Filter: 9775 Heap Blocks: exact=28018 -> BitmapAnd (cost=24767.69..24767.69 rows=6344 width=0) (actual time=820.608..820.608 rows=0 loops=1) -> Bitmap Index Scan on available_channels_idx (cost=0.00..4046.06 rows=363475 width=0) (actual time=207.050..207.050 rows=442083 loops=1) Index Cond: (available_channels(social, contact_info) <@ '{yt}'::text[]) -> Bitmap Index Scan on mixed_frequent_locations_idx (cost=0.00..5550.85 rows=617447 width=0) (actual time=276.099..276.099 rows=620980 loops=1) Index Cond: ('{"United States"}'::text[] <@ mixed_frequent_locations(location)) -> Bitmap Index Scan on idx_in_social_follower_count (cost=0.00..15170.13 rows=821559 width=0) (actual time=290.351..290.351 rows=834091 loops=1) Index Cond: (social_follower_count(social) > 30000) Planning time: 20.168 ms Execution time: 42338.700 ms _

Set enable_bitmapscan = off;の変更クエリプランを大幅に変更しました：

_ Limit (cost=0.43..192792.76 rows=12 width=1526) (actual time=25.710..145.877 rows=12 loops=1) Buffers: shared hit=8508 -> Index Scan using idx_in_social_follower_count_and_created_at2 on users (cost=0.43..3454196.26 rows=215 width=1526) (actual time=25.707..145.864 rows=12 loops=1) Index Cond: (social_follower_count(social) > 30000) Filter: (has_emails(contact_info) AND not_business(social) AND is_visible(social, flags) AND (engagement_level(social) <= 3) AND (available_channels(social, contact_info) <@ '{yt}'::text[]) AND ('{"United States"}'::text[] <@ mixed_frequent_locations(location))) Rows Removed by Filter: 346 Buffers: shared hit=8508 _

計画時間：0.830 ms実行時間：145.949 ms

実行時間は劇的に変化しました、クエリ入力に応じてビットマップスキャンを有効または無効にすることは可能ですか？

また、default_statistics_targetの値を2000と1000に増やしてみましたが、大きな改善はありません。クエリプランは同じままです。

_ Limit (cost=122386.75..122386.78 rows=12 width=1528) (actual time=106296.479..106296.484 rows=12 loops=1) Buffers: shared hit=5010705 read=408947 written=467 -> Sort (cost=122386.75..122390.13 rows=1349 width=1528) (actual time=106296.477..106296.478 rows=12 loops=1) Sort Key: (social_follower_count(social)) DESC, created_at Sort Method: top-N heapsort Memory: 61kB Buffers: shared hit=5010705 read=408947 written=467 -> Bitmap Heap Scan on users (cost=45119.25..122355.83 rows=1349 width=1528) (actual time=556.192..106183.697 rows=41736 loops=1) Recheck Cond: ((available_channels(social, contact_info) <@ '{yt}'::text[]) AND ('{"United States"}'::text[] <@ mixed_frequent_locations(location))) Rows Removed by Index Recheck: 17700 Filter: (has_emails(contact_info) AND not_business(social) AND is_visible(social, flags) AND (engagement_level(instagram) <= 3)) Rows Removed by Filter: 20977 Heap Blocks: exact=76531 Buffers: shared hit=5010705 read=408947 written=467 -> BitmapAnd (cost=45119.25..45119.25 rows=15170 width=0) (actual time=533.751..533.751 rows=0 loops=1) Buffers: shared hit=4 read=5492 -> Bitmap Index Scan on available_channels_idx (cost=0.00..4094.86 rows=369981 width=0) (actual time=100.521..100.522 rows=442083 loops=1) Index Cond: (available_channels(social, contact_info) <@ '{yt}'::text[]) Buffers: shared hit=3 read=105 -> Bitmap Index Scan on mixed_frequent_locations_idx (cost=0.00..5576.65 rows=620353 width=0) (actual time=156.311..156.311 rows=620980 loops=1) Index Cond: ('{"United States"}'::text[] <@ mixed_frequent_locations(location)) Buffers: shared hit=1 read=141 -> Bitmap Index Scan on idx_in_contact_info_has_emails (cost=0.00..35446.23 rows=1919173 width=0) (actual time=243.802..243.803 rows=1918927 loops=1) Index Cond: (has_emails(contact_info) = true) Buffers: shared read=5246 Planning time: 13.691 ms Execution time: 106296.586 ms _

条件の選択性分析：

_(social_follower_count(social) > '30000') - 834091 records (engagement_level(social) <= 3) - 4311859 records (available_channels(social, contact_info) <@ array['yt']) - 342815 records (has_emails(contact_info) = TRUE) - 1918927 records (not_business(social) = TRUE) - 3869626 records (array['United States'] <@ mixed_frequent_locations(location)) - 620980 records (is_visible(social, flags) = TRUE) - 4959302 records _

@jjanesがsocial_follower_count(social) > 30000句の削除を提案したように、クエリプランは劇的に変化し、フルインデックススキャンが適用され、ビットマップヒープスキャンは適用されません。

_ Limit (cost=0.43..88024.52 rows=12 width=1436) (actual time=20.779..181.868 rows=12 loops=1) -> Index Scan using idx_in_social_follower_count_and_created_at on users (cost=0.43..9543278.28 rows=1301 width=1436) (actual time=20.777..181.858 rows=12 loops=1) Filter: (has_emails(contact_info) AND not_business(social) AND is_visible(social, flags) AND (engagement_level(social) <= 3) AND (available_channels(social, contact_info) <@ '{yt}'::text[]) AND ('{"United States"}'::text[] <@ mixed_frequent_locations( location))) Rows Removed by Filter: 347 Planning time: 0.523 ms Execution time: 181.963 ms _

Luciano Andress Martini · Answer

Postgresql 9.1から9.6にアップグレードした後、enable_bitmapscanを無効にするとパフォーマンスが向上するなど、同じ問題が発生します。

おそらく低いdefault_statistics_targetで実行しています：

SHOW default_statistics_target ;

新しいpostgresqlバージョンでは、多くの新しいクエリプランナー機能によってデフォルト値が100とんでもないものになるため、これを増やす必要があります。enable_bitmapscanを有効にしてそれを増やし、最適なクエリプランが選択されるまでもう一度クエリを試してください。

次のコマンドを発行して、さまざまな値のセットを試すことができます。

set default_statistics_target to '2000'; --for example --this is a reasonable value for most of databases...

その後：

analyze; --(or vacuum analyze)

次に、最適な統計ターゲットが見つかるまで、クエリを繰り返し実行します...

そして、デフォルト値をpostgresql.confに入れて、サービスをリロード/再起動できます。

jjanes · Answer

完全なビットマップが許可されたメモリに収まらないため、再チェックが必要です。したがって、ブロックではなく、ブロックとブロック内の行オフセットを格納することにより、不可逆圧縮を使用する必要があります。「work_mem」の設定を増やして、ビットマップ全体がメモリに収まるようにします。つまり、「ヒープブロック：... lossy = ...」がなくなるまで続けます。

さて、それでもまだ十分ではないかもしれませんが、少なくともそれは戦いのチャンスがあります。

上記のように約10倍助けられたように見えますが、それでもまだ十分ではありません。

特定のクエリの前に_set enable_bitmapscan = off_を実行し、そのクエリの後に_reset enable_bitmapscan_を実行することは確かに可能です。これに関する潜在的な問題は、SQLを抽象化しようとする一部のプログラミングフレームワークでは、それを行うためにクエリを手に入れることが難しくなることです。また、その設定が必要なクエリを特定するのは難しい場合があります。 PostgreSQLプランナーがそれをうまく理解していれば、そもそもこの問題はありません。だからあなたはそこに一人でいる。

基本的な問題はおそらくこの行です：

_-> Bitmap Heap Scan on users (cost=24767.69..59624.63 rows=199 width=1531) (actual time=846.650..42281.071 rows=11616 loops=1) _

予想よりも60倍多くの行が検出されます。この理由は、おそらく列間の相関関係です。たとえば、YouTubeチャンネルを持っている人は、偶然が示唆するよりも人気がある可能性が高いです。 postgresの新しいバージョンでは、列間相関に関する統計を収集する方法がありますが、（現在）式ではなく実際の列でなければならないため、また機能するだけだと思うので、ここでは機能しません配列ではなくスカラーで。

1つのアイデアは、クエリのsocial_follower_count(social) > '30000'部分を削除することです。あなたはすでにその値でソートしていて、トップ12だけを取得しているので、定量的カットオフも制限することは本当に必要ですか？その部分を省略した場合、他の値と相関させることができなくなり、推定の問題が発生します。