web-dev-qa-db-ja.com

インデックス化されたDISTINCT ONが内部結合よりもはるかに遅いのはなぜですか?

2つのテーブル、customerspurchasesがあります。顧客あたりの購入は多数(数千)あります。通常、各顧客の最新の購入のみが必要なので、latest_purchase_id列を追加し、購入を追加するたびにトリガーで更新します( https://dba.stackexchange.com/a/243988/186435 を参照)。

トリガーを使用したくないので、DISTINCT ONインデックスを使用してクエリを実行しますが、処理速度が遅くなり、理由がわかりません。

テーブルcustomers

       Column        |  Type    |                       Modifiers                        | Storage  | Stats target | Description
---------------------+----------+--------------------------------------------------------+----------+--------------+-------------
 id                  | integer  | not null default nextval('customers_id_seq'::regclass) | plain    |              |
 latest_purchase_id  | integer  |                                                        | plain    |              |
Indexes:
    "customers_pkey" PRIMARY KEY, btree (id)
    "customers_latest_purchase_id" btree (latest_purchase_id)
Foreign-key constraints:
    "customers_latest_purchase_fk" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "purchases" CONSTRAINT "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

テーブルpurchases

     Column   |  Type     |                        Modifiers                       | Storage  | Stats target | Description
--------------+-----------+--------------------------------------------------------+----------+--------------+-------------
 id           | integer   | not null default nextval('purchases_id_seq'::regclass) | plain    |              |
 customer_id  | integer   |                                                        | plain    |              |
Indexes:
    "purchases_pkey" PRIMARY KEY, btree (id)
    "purchases_customer_id_id" btree (customer_id, id)
    "purchases_customer_id" btree (customer_id)
Foreign-key constraints:
    "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "customers" CONSTRAINT "customers_latest_purchase_id" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

DISTINCT ONクエリ:

EXPLAIN ANALYZE SELECT DISTINCT ON (customer_id) id, customer_id FROM purchases ORDER BY customer_id DESC, id DESC;
 Result  (cost=0.43..162516.37 rows=381 width=8) (actual time=0.050..1478.196 rows=823 loops=1)
   ->  Unique  (cost=0.43..162516.37 rows=381 width=8) (actual time=0.047..1477.754 rows=823 loops=1)
         ->  Index Only Scan Backward using purchases_customer_id_id on purchases  (cost=0.43..157850.96 rows=1866163 width=8) (actual time=0.045..1066.759 rows=1866132 loops=1)
               Heap Fetches: 1363529
 Planning Time: 0.096 ms
 Execution Time: 1478.408 ms

INNER JOINに基づくクエリlatest_purchase

EXPLAIN ANALYZE SELECT c.id, p.id FROM customers c JOIN purchases p ON c.latest_purchase = p.id;
 Nested Loop  (cost=0.43..43877.27 rows=7594 width=8) (actual time=0.508..112.665 rows=755 loops=1)
   ->  Seq Scan on customers d  (cost=0.00..213.94 rows=7594 width=8) (actual time=0.006..2.861 rows=7594 loops=1)
   ->  Index Only Scan using customers_purchase_pkey on purchases p  (cost=0.43..5.75 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=7594)
         Index Cond: (id = c.latest_purchase)
         Heap Fetches: 583
 Planning Time: 1.032 ms
 Execution Time: 112.861 ms
2

これが答えです:

顧客あたりの購入は多数(数千)あります。

_DISTINCT ON_は、顧客ごとのfew購入に対して高速です。見る:

これはmuch速くなるはずです:

_SELECT c.id AS customer_id, p.id AS purchase_id
FROM   customers c
LEFT   JOIN LATERAL (
   SELECT p.id
   FROM   purchases p
   WHERE  p.customer_id = c.id
   ORDER  BY p.id DESC
   LIMIT  1
   ) p ON true;
_

微妙な違い:あらゆる顧客が結果に含まれています。

インデックス"purchases_customer_id_id" btree (customer_id, id)はこれに適しています。 _(customer_id, id DESC)_のインデックスは少しでも良いでしょう。

見る:

余談1:

最初の計画は_rows=823_を示し、2番目の計画は_rows=755_を示します。テーブルcustomersに一致しない_purchases.customer_id_があることを示していますが、通常は一致しません。 _purchases.customer_id_から_customers.id_にFK制約を追加し、_purchases.customer_id NOT NULL_を作成して参照整合性を適用します。

余談2:

各クエリプランの最後にたくさんの_Heap Fetches_があります。十分掃除機をかけていますか?見る:

2