2つのテーブル、customers
とpurchases
があります。顧客あたりの購入は多数(数千)あります。通常、各顧客の最新の購入のみが必要なので、latest_purchase_id
列を追加し、購入を追加するたびにトリガーで更新します( https://dba.stackexchange.com/a/243988/186435 を参照)。
トリガーを使用したくないので、DISTINCT ON
インデックスを使用してクエリを実行しますが、処理速度が遅くなり、理由がわかりません。
テーブルcustomers
:
Column | Type | Modifiers | Storage | Stats target | Description
---------------------+----------+--------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('customers_id_seq'::regclass) | plain | |
latest_purchase_id | integer | | plain | |
Indexes:
"customers_pkey" PRIMARY KEY, btree (id)
"customers_latest_purchase_id" btree (latest_purchase_id)
Foreign-key constraints:
"customers_latest_purchase_fk" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "purchases" CONSTRAINT "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no
テーブルpurchases
:
Column | Type | Modifiers | Storage | Stats target | Description
--------------+-----------+--------------------------------------------------------+----------+--------------+-------------
id | integer | not null default nextval('purchases_id_seq'::regclass) | plain | |
customer_id | integer | | plain | |
Indexes:
"purchases_pkey" PRIMARY KEY, btree (id)
"purchases_customer_id_id" btree (customer_id, id)
"purchases_customer_id" btree (customer_id)
Foreign-key constraints:
"purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
TABLE "customers" CONSTRAINT "customers_latest_purchase_id" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no
DISTINCT ON
クエリ:
EXPLAIN ANALYZE SELECT DISTINCT ON (customer_id) id, customer_id FROM purchases ORDER BY customer_id DESC, id DESC;
Result (cost=0.43..162516.37 rows=381 width=8) (actual time=0.050..1478.196 rows=823 loops=1)
-> Unique (cost=0.43..162516.37 rows=381 width=8) (actual time=0.047..1477.754 rows=823 loops=1)
-> Index Only Scan Backward using purchases_customer_id_id on purchases (cost=0.43..157850.96 rows=1866163 width=8) (actual time=0.045..1066.759 rows=1866132 loops=1)
Heap Fetches: 1363529
Planning Time: 0.096 ms
Execution Time: 1478.408 ms
INNER JOIN
に基づくクエリlatest_purchase
:
EXPLAIN ANALYZE SELECT c.id, p.id FROM customers c JOIN purchases p ON c.latest_purchase = p.id;
Nested Loop (cost=0.43..43877.27 rows=7594 width=8) (actual time=0.508..112.665 rows=755 loops=1)
-> Seq Scan on customers d (cost=0.00..213.94 rows=7594 width=8) (actual time=0.006..2.861 rows=7594 loops=1)
-> Index Only Scan using customers_purchase_pkey on purchases p (cost=0.43..5.75 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=7594)
Index Cond: (id = c.latest_purchase)
Heap Fetches: 583
Planning Time: 1.032 ms
Execution Time: 112.861 ms
これが答えです:
顧客あたりの購入は多数(数千)あります。
_DISTINCT ON
_は、顧客ごとのfew購入に対して高速です。見る:
これはmuch速くなるはずです:
_SELECT c.id AS customer_id, p.id AS purchase_id
FROM customers c
LEFT JOIN LATERAL (
SELECT p.id
FROM purchases p
WHERE p.customer_id = c.id
ORDER BY p.id DESC
LIMIT 1
) p ON true;
_
微妙な違い:あらゆる顧客が結果に含まれています。
インデックス"purchases_customer_id_id" btree (customer_id, id)
はこれに適しています。 _(customer_id, id DESC)
_のインデックスは少しでも良いでしょう。
見る:
余談1:
最初の計画は_rows=823
_を示し、2番目の計画は_rows=755
_を示します。テーブルcustomers
に一致しない_purchases.customer_id
_があることを示していますが、通常は一致しません。 _purchases.customer_id
_から_customers.id
_にFK制約を追加し、_purchases.customer_id NOT NULL
_を作成して参照整合性を適用します。
余談2:
各クエリプランの最後にたくさんの_Heap Fetches
_があります。十分掃除機をかけていますか?見る: