個別/グループ化のパフォーマンス

Question

次の（簡略化された）テーブルを使用して、ルーム内のメッセージの最新の作成者を選択しようとしています。

_ Table "public.message" Column | Type | Nullable | Default ---------+---------+----------+------------------------------------- id | bigint | not null | nextval('message_id_seq'::regclass) room | integer | not null | author | integer | not null | created | integer | not null | Indexes: "message_pkey" PRIMARY KEY, btree (id) "message_author_created_room" btree (author, created, room) "message_room_author_created" btree (room, author, created) "message_room_created" btree (room, created) "message_room_id" btree (room, id) _

問題は、次のようなクエリが遅いことです。

_select message.author as id, max(message.created) as mc from message where room=12 group by message.author order by mc desc limit 50; _

そして、これがexplain(analyze, verbose, buffers)です：

_miaou=> explain (analyze, verbose, buffers) select message.author as id, max(message.created) as mc from message where room=12 group by message.author order by mc desc limit 50; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=10627.14..10627.26 rows=50 width=8) (actual time=54.887..54.901 rows=50 loops=1) Output: author, (max(created)) Buffers: shared hit=490 -> Sort (cost=10627.14..10629.19 rows=820 width=8) (actual time=54.885..54.891 rows=50 loops=1) Output: author, (max(created)) Sort Key: (max(message.created)) DESC Sort Method: top-N heapsort Memory: 29kB Buffers: shared hit=490 -> Finalize GroupAggregate (cost=1000.46..10599.90 rows=820 width=8) (actual time=14.019..54.788 rows=160 loops=1) Output: author, max(created) Group Key: message.author Buffers: shared hit=490 -> Gather Merge (cost=1000.46..10583.50 rows=1640 width=8) (actual time=14.007..54.636 rows=248 loops=1) Output: author, (PARTIAL max(created)) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=490 -> Partial GroupAggregate (cost=0.43..9394.18 rows=820 width=8) (actual time=3.439..34.733 rows=83 loops=3) Output: author, PARTIAL max(created) Group Key: message.author Buffers: shared hit=2989 Worker 0: actual time=0.297..49.593 rows=116 loops=1 Buffers: shared hit=1550 Worker 1: actual time=6.624..40.612 rows=60 loops=1 Buffers: shared hit=949 -> Parallel Index Only Scan using message_room_author_created on public.message (cost=0.43..8904.09 rows=96377 width=8) (actual time=0.030..20.067 rows=73907 loops=3) Output: author, created Index Cond: (message.room = 12) Heap Fetches: 139 Buffers: shared hit=2989 Worker 0: actual time=0.035..28.355 rows=109834 loops=1 Buffers: shared hit=1550 Worker 1: actual time=0.030..23.723 rows=79112 loops=1 Buffers: shared hit=949 Planning time: 0.211 ms Execution time: 57.071 ms _

それをより速くする方法を知りたいのですが、本当に重要な目標は、N人の最新作家を手に入れることです。その情報を照会するより速い方法はありますか？

Evan Carroll · Accepted Answer

_select count(author) from message where room=12; _

作成者はnullにできないため、SELECT count(*)を実行するだけです。そうすることで、これは_message_room_created_または_message_room_id_のいずれかでインデックスのみのスキャンになる可能性があります。持っていない独自の作者が必要な場合は、count( DISTINCT author )を使用する必要があります。

また、これは並列クエリとして計画されています。これは比較的新しい機能です。 _SET max_parallel_workers_per_gather = 0_を試して、並列クエリ機能を無効にし、結果を再投稿することもできます。

どちらにしても

これは遅いです（同じ期間、約45ms）

インデックススキャンとヒープフェッチで何百万行も取得している場合、45msを「遅い」と呼ぶのはあまり公平ではありません。