私はあなたに迷惑をかけるためにここに来る前にこのトピックについてできる限り自分自身を文書化しようとしましたが、とにかくここにいます。
このテーブルにキーセットのページネーションを実装したいと思います:
create table api.subscription (
subscription_id uuid primary key,
token_id uuid not null,
product_id uuid not null references api.product(product_id) deferrable,
spid bigint null,
attributes_snapshot jsonb not null,
created_at timestamp not null,
refreshed_at timestamp,
enriched_at timestamp null,
valid_until timestamp not null,
is_cancelled boolean not null,
has_been_expired boolean not null,
has_quality_data boolean not null
);
そのために、このクエリを使用してページ分割メタデータを準備します。
with book as (
select created_at, subscription_id
from api.subscription
where token_id = $1
and refreshed_at >= $2
and valid_until >= now()
and not is_cancelled
and not has_been_expired
),
total as (
select count(*) as total from book
),
page as (
select * from book
where (case when $4 is null then true else (created_at > $4 or (created_at = $4 and subscription_id > $5)) end)
order by created_at asc, subscription_id asc
limit $3
),
last_row as (
select last_value(created_at) over() as last_seen_created_at,
last_value(subscription_id) over() as last_seen_subscription_id
from page
),
ids as (
select array_agg(subscription_id) ids from page
)
select * from last_row, total, ids
これにより、ユーザーが現在ページ分割しているアイテムの総数、最後に表示されたキー(次のページ)、および現在のページのID(IN
句(またはany
)ですが、それは別の話です)。
問題は、テーブルに1900万行が含まれている場合にかかる時間です。
Nested Loop (cost=50000.14..50000.22 rows=1 width=64) (actual time=143.677..143.677 rows=0 loops=1)
Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id, total.total, ids.ids
Buffers: shared hit=395 read=24611
CTE book
-> Seq Scan on api.subscription (cost=0.00..50000.00 rows=1 width=24) (actual time=143.646..143.646 rows=0 loops=1)
Output: subscription.created_at, subscription.subscription_id
Filter: ((NOT subscription.is_cancelled) AND (NOT subscription.has_been_expired) AND (subscription.token_id = '73759739-7af5-4d28-91bc-897c959bddcd'::uuid) AND (subscription.valid_until >= now()) AND (subscription.refreshed_at >= (now() - '4 days'::interval)))
Rows Removed by Filter: 1000000
Buffers: shared hit=389 read=24611
CTE total
-> Aggregate (cost=0.02..0.03 rows=1 width=8) (never executed)
Output: count(*)
-> CTE Scan on book (cost=0.00..0.02 rows=1 width=0) (never executed)
Output: book.created_at, book.subscription_id
CTE page
-> Limit (cost=0.03..0.04 rows=1 width=24) (actual time=143.674..143.674 rows=0 loops=1)
Output: book_1.created_at, book_1.subscription_id
Buffers: shared hit=395 read=24611
-> Sort (cost=0.03..0.04 rows=1 width=24) (actual time=143.673..143.673 rows=0 loops=1)
Output: book_1.created_at, book_1.subscription_id
Sort Key: book_1.created_at, book_1.subscription_id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=395 read=24611
-> CTE Scan on book book_1 (cost=0.00..0.02 rows=1 width=24) (actual time=143.646..143.646 rows=0 loops=1)
Output: book_1.created_at, book_1.subscription_id
Buffers: shared hit=389 read=24611
CTE last_row
-> WindowAgg (cost=0.00..0.03 rows=1 width=24) (actual time=143.675..143.675 rows=0 loops=1)
Output: last_value(page.created_at) OVER (?), last_value(page.subscription_id) OVER (?)
Buffers: shared hit=395 read=24611
-> CTE Scan on page (cost=0.00..0.02 rows=1 width=24) (actual time=143.674..143.674 rows=0 loops=1)
Output: page.created_at, page.subscription_id
Buffers: shared hit=395 read=24611
CTE ids
-> Aggregate (cost=0.02..0.03 rows=1 width=32) (never executed)
Output: array_agg(page_1.subscription_id)
-> CTE Scan on page page_1 (cost=0.00..0.02 rows=1 width=16) (never executed)
Output: page_1.created_at, page_1.subscription_id
-> Nested Loop (cost=0.00..0.05 rows=1 width=32) (actual time=143.677..143.677 rows=0 loops=1)
Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id, total.total
Buffers: shared hit=395 read=24611
-> CTE Scan on last_row (cost=0.00..0.02 rows=1 width=24) (actual time=143.676..143.676 rows=0 loops=1)
Output: last_row.last_seen_created_at, last_row.last_seen_subscription_id
Buffers: shared hit=395 read=24611
-> CTE Scan on total (cost=0.00..0.02 rows=1 width=8) (never executed)
Output: total.total
-> CTE Scan on ids (cost=0.00..0.02 rows=1 width=32) (never executed)
Output: ids.ids
Planning time: 0.580 ms
Execution time: 143.786 ms
注:これは100万行しかない例ですが、1900万行を超えると20秒以上かかります。
私たちは運が悪く、さまざまなインデックスの設定を試しました:
Indexes:
"subscription_pkey" PRIMARY KEY, btree (subscription_id)
"has_been_expired_idx" btree (has_been_expired)
"is_cancelled_idx" btree (is_cancelled)
"pagination_idx" btree (token_id, refreshed_at, valid_until, is_cancelled, has_been_expired)
"refreshed_at_idx" btree (refreshed_at)
"token_and_created_at_idx" btree (token_id, created_at)
"token_and_refreshed_at_idx" btree (token_id, refreshed_at)
"token_id_idx" btree (token_id)
"valid_until_idx" btree (valid_until)
Foreign-key constraints:
"subscription_product_id_fkey" FOREIGN KEY (product_id) REFERENCES api.product(product_id) DEFERRABLE
だから私の質問は:
ヒントがあれば歓迎します:)私の質問が理にかなっているといいのですが。
お読みいただきありがとうございます。
合計がなければ、次のように書き直すことができます
with page as (
select created_at, subscription_id
from api.subscription
where token_id = $1
and refreshed_at >= $2
and valid_until >= now()
and not is_cancelled
and not has_been_expired
and (case when $4 is null then true else (created_at > $4 or (created_at = $4 and subscription_id > $5)) end)
order by created_at asc, subscription_id asc
limit $3
),
last_row as (
select last_value(created_at) over() as last_seen_created_at,
last_value(subscription_id) over() as last_seen_subscription_id
from page
),
ids as (valid_until
select array_agg(subscription_id) ids from page
)
select * from last_row, ids
is_cancelled
、has_been_expired
、token_id
の複合インデックスとrefreshed_at
、valid_until
またはcreated_at
のいずれかが役立ちます。