非常に遅い単純なJOINクエリ

Question

シンプルなDB構造（オンラインフォーラム用）：

_CREATE TABLE users ( id integer NOT NULL PRIMARY KEY, username text ); CREATE INDEX ON users (username); CREATE TABLE posts ( id integer NOT NULL PRIMARY KEY, thread_id integer NOT NULL REFERENCES threads (id), user_id integer NOT NULL REFERENCES users (id), date timestamp without time zone NOT NULL, content text ); CREATE INDEX ON posts (thread_id); CREATE INDEX ON posts (user_id); _

usersに約8万エントリ、postsテーブルに260万エントリ。投稿ごとに上位100人のユーザーを取得するこの簡単なクエリには、2,4秒かかります：

_EXPLAIN ANALYZE SELECT u.id, u.username, COUNT(p.id) AS PostCount FROM users u INNER JOIN posts p on p.user_id = u.id WHERE u.username IS NOT NULL GROUP BY u.id ORDER BY PostCount DESC LIMIT 100; _

_Limit (cost=316926.14..316926.39 rows=100 width=20) (actual time=2326.812..2326.830 rows=100 loops=1) -> Sort (cost=316926.14..317014.83 rows=35476 width=20) (actual time=2326.809..2326.820 rows=100 loops=1) Sort Key: (count(p.id)) DESC Sort Method: top-N heapsort Memory: 32kB -> HashAggregate (cost=315215.51..315570.27 rows=35476 width=20) (actual time=2311.296..2321.739 rows=34608 loops=1) Group Key: u.id -> Hash Join (cost=1176.89..308201.88 rows=1402727 width=16) (actual time=16.538..1784.546 rows=1910831 loops=1) Hash Cond: (p.user_id = u.id) -> Seq Scan on posts p (cost=0.00..286185.34 rows=1816634 width=8) (actual time=0.103..1144.681 rows=2173916 loops=1) -> Hash (cost=733.44..733.44 rows=35476 width=12) (actual time=15.763..15.763 rows=34609 loops=1) Buckets: 65536 Batches: 1 Memory Usage: 2021kB -> Seq Scan on users u (cost=0.00..733.44 rows=35476 width=12) (actual time=0.033..6.521 rows=34609 loops=1) Filter: (username IS NOT NULL) Rows Removed by Filter: 11335 Execution time: 2301.357 ms _

_set enable_seqscan = false_を使用するとさらに悪い：

_Limit (cost=1160881.74..1160881.99 rows=100 width=20) (actual time=2758.086..2758.107 rows=100 loops=1) -> Sort (cost=1160881.74..1160970.43 rows=35476 width=20) (actual time=2758.084..2758.098 rows=100 loops=1) Sort Key: (count(p.id)) DESC Sort Method: top-N heapsort Memory: 32kB -> GroupAggregate (cost=0.79..1159525.87 rows=35476 width=20) (actual time=0.095..2749.859 rows=34608 loops=1) Group Key: u.id -> Merge Join (cost=0.79..1152157.48 rows=1402727 width=16) (actual time=0.036..2537.064 rows=1910831 loops=1) Merge Cond: (u.id = p.user_id) -> Index Scan using users_pkey on users u (cost=0.29..2404.83 rows=35476 width=12) (actual time=0.016..41.163 rows=34609 loops=1) Filter: (username IS NOT NULL) Rows Removed by Filter: 11335 -> Index Scan using posts_user_id_index on posts p (cost=0.43..1131472.19 rows=1816634 width=8) (actual time=0.012..2191.856 rows=2173916 loops=1) Planning time: 1.281 ms Execution time: 2758.187 ms _

Postgresではusernameによるグループ化は必要ないため、欠落しています（SQL Serverでは、ユーザー名を選択する場合はusernameでグループ化する必要があると言っています）。 usernameでグループ化すると、Postgresでの実行時間に少しmsが追加されるか、何も行われません。

科学のために、Microsoft SQL Serverを同じサーバー（archlinux、8コアxeon、24 GB RAM、SSDを実行）にインストールし、すべてのデータをPostgresから移行しました-sameテーブル構造、sameインデックス、sameデータ。同じクエリで上位100人のポスターを取得0.3秒：

_SELECT TOP 100 u.id, u.username, COUNT(p.id) AS PostCount FROM dbo.users u INNER JOIN dbo.posts p on p.user_id = u.id WHERE u.username IS NOT NULL GROUP BY u.id, u.username ORDER BY PostCount DESC _

収量sameは同じデータの結果ですが、8倍高速です。 Linux上のMS SQLのベータ版です。「ホーム」OSであるWindowsサーバーで実行すると、さらに高速になる可能性があります。

私のPostgreSQLクエリは完全に間違っていますか、それともPostgreSQLが遅いのですか？

追加情報

バージョンはほぼ最新です（9.6.1、現在最新は9.6.2、ArchLinuxは古いパッケージを使用しており、更新が非常に遅い）。構成：

_max_connections = 75 shared_buffers = 3584MB effective_cache_size = 10752MB work_mem = 24466kB maintenance_work_mem = 896MB dynamic_shared_memory_type = posix min_wal_size = 1GB max_wal_size = 2GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 _

_EXPLAIN ANALYZE_出力： https://Pastebin.com/HxucRgnk

すべてのインデックスを試して、GINとGistも使用しました。PostgreSQLの最速の方法（およびGooglingは多くの行で確認）は、順次スキャンを使用することです。

MS SQL Server 14.0.405.200-1、デフォルトの設定。

私はこれをAPIで使用し（分析なしの単純な選択）、このAPIエンドポイントをchromeで呼び出すと、2500ミリ秒+-かかり、50ミリ秒のHTTPおよびWebサーバーのオーバーヘッドが追加されるAPIとSQLは同じサーバー上で実行されます）-それは同じです。

explain analyze SELECT user_id, count(9) FROM posts group by user_id;には700ミリ秒かかります。 postsテーブルのサイズは2154 MBです。

funny_falcon · Accepted Answer

別の良いクエリバリアントは次のとおりです。

SELECT p.user_id, p.cnt AS PostCount FROM users u INNER JOIN ( select user_id, count(id) as cnt from posts group by user_id ) as p on p.user_id = u.id WHERE u.username IS NOT NULL ORDER BY PostCount DESC LIMIT 100;

これはCTEを利用せず、正しい答えを提供します（そして、CTEの例では、理論的には100行未満の行が生成されるため、最初に制限されてからユーザーと結合されます）。

MSSQLはクエリオプティマイザーでこのような変換を実行でき、PostgreSQLは結合で集約をプッシュダウンできないと思います。または、MSSQLはハッシュ結合の実装がはるかに高速です。

Scoots · Answer

これは機能する場合と機能しない場合があります。これは、グループとフィルターの前にテーブルに参加しているという直感に基づいています。次のことを試してみることをお勧めします：結合を試みる前に、CTEを使用してフィルターおよびグループ化します。

with __posts as( select user_id, count(1) as num_posts from posts group by user_id order by num_posts desc limit 100 ) select users.username, __posts.num_posts from users inner join __posts on( __posts.user_id = users.id ) order by num_posts desc

クエリプランナーは、少しのガイダンスが必要な場合があります。このソリューションはここでうまく機能しますが、CTEは状況によってはひどい場合があります。 CTEはメモリにのみ格納されます。この結果、大量のデータが返されると、Postgresに割り当てられたメモリを超え、スワッピング（MSでのページング）を開始する可能性があります。 CTEにもインデックスを付けることはできないため、十分に大きいクエリを使用しても、CTEのクエリ時に大幅な速度低下が発生する可能性があります。

あなたが本当に取り除くことができる最善のアドバイスは、それを複数の方法で試し、クエリプランをチェックすることです。