時系列の最後の値のクエリを高速化する方法は？

Question

PostgreSQL 10 DBに時系列テーブルpricesがあります。
これは簡略化されたテストケースで問題を説明しています：

CREATE TABLE prices ( currency text NOT NULL, side boolean NOT NULL, price numeric NOT NULL, ts timestamptz NOT NULL );

各currency/side duoの最後の値をすばやく照会したいと思います。これにより、各通貨の現在の売買価格が得られます。

私の現在の解決策は：

create index on prices (currency, side, ts desc); select distinct on (currency, side) * order by currency, side, ts desc;

しかし、これにより私は非常に遅いクエリ（〜500ms）をこのテーブルで生成します。

actual tableには、グループ化する列が2つではなく4つあります。実際のテーブルとクエリは次のようになります。

create table prices ( exchange integer not null, pair text not null, side boolean not null, guaranteed_volume numeric not null, ts timestamp with time zone not null, price numeric not null, constraint prices_pkey primary key (exchange, pair, side, guaranteed_volume, ts), constraint prices_exchange_fkey foreign key (exchange) references exchanges (id) match simple on update no action on delete no action ); create index prices_exchange_pair_side_guaranteed_volume_ts_idx on prices (exchange, pair, side, guaranteed_volume, ts desc); create view last_prices as select distinct on (exchange, pair, side, guaranteed_volume) exchange , pair , side , guaranteed_volume , price , ts from prices order by exchange , pair , side , guaranteed_volume , ts desc;

現在34441行あります。いくつかの便利なデバッグクエリ：

# explain (analyze,buffers) select * from last_prices; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Unique (cost=2662.03..2997.71 rows=1224 width=37) (actual time=403.218..459.041 rows=392 loops=1) Buffers: shared hit=418 -> Sort (cost=2662.03..2729.17 rows=26854 width=37) (actual time=403.213..411.041 rows=28353 loops=1) Sort Key: prices.exchange, prices.pair, prices.side, prices.guaranteed_volume, prices.ts DESC Sort Method: quicksort Memory: 2984kB Buffers: shared hit=418 -> Seq Scan on prices (cost=0.00..686.54 rows=26854 width=37) (actual time=0.022..31.407 rows=28353 loops=1) Buffers: shared hit=418 Planning time: 0.911 ms Execution time: 460.190 ms

Seqscanを無効にした分析について説明します。

# explain (analyze,buffers) select * from last_prices; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Unique (cost=0.41..4458.07 rows=1224 width=37) (actual time=0.037..122.237 rows=392 loops=1) Buffers: shared hit=15182 -> Index Scan using prices_exchange_pair_side_guaranteed_volume_ts_idx on prices (cost=0.41..4189.53 rows=26854 width=37) (actual time=0.034..91.237 rows=29649 loops=1) Buffers: shared hit=15182 Planning time: 0.291 ms Execution time: 122.417 ms

ビューのクエリに直接アクセスするクエリを追加します。

# explain (analyze, buffers) select distinct on (exchange, pair, side, guaranteed_volume) exchange , pair , side , guaranteed_volume , price , ts from prices order by exchange , pair , side , guaranteed_volume , ts desc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Unique (cost=2163.56..2429.99 rows=1224 width=37) (actual time=364.716..391.405 rows=380 loops=1) Buffers: shared hit=418 -> Sort (cost=2163.56..2216.85 rows=21314 width=37) (actual time=364.711..370.458 rows=24011 loops=1) Sort Key: exchange, pair, side, guaranteed_volume, ts DESC Sort Method: quicksort Memory: 2644kB Buffers: shared hit=418 -> Seq Scan on prices (cost=0.00..631.14 rows=21314 width=37) (actual time=0.025..13.751 rows=24011 loops=1) Buffers: shared hit=418 Planning time: 0.258 ms Execution time: 392.110 ms

Erwin Brandstetter · Accepted Answer

各currency/side duoの最後の値をすばやく照会したい

_DISTINCT ON_はfew行の組み合わせごとに優れています。ただし、ユースケースには明らかに、個別の_(currency, side)_ごとにmany行があります。したがって、パフォーマンスに関する限り、_DISTINCT ON_は悪い選択です。 SOに関する次の2つの関連する回答で、詳細な評価と解決策の集まりが見つかります。

If必要なのは最新のタイムスタンプtsだけで、列はソート基準と目的の戻り値が1つになり、ケースは非常に簡単です。 max(ts)を使用したEvanの単純なソリューションを見てください。

（まあ、理想的には、max(ts)はNULL値を無視し、このソート順によりよく一致するため、_(currency, side, ts desc NULLS LAST)_にインデックスがあります。 _NOT NULL_で定義された列ではそれほど問題ではありません。）

通常、選択した各行から追加の列（現在の価格など）が必要か、複数の列で並べ替える必要があるため、さらに多くの作業を行う必要があります。

理想的には、すべての通貨を一覧表示する別のテーブルと、参照整合性を適用し、存在しない通貨値を禁止するFK制約があります。次に、追加されたsideを考慮して拡張された、リンクされた回答の章 "2a。LATERAL結合"からのクエリ手法を使用します。

最初の簡単なテストケースに基づいて：

_SELECT c.currency, s.side, p.* FROM currency c CROSS JOIN (VALUES (true), (false)) s(side) -- account for side CROSS JOIN LATERAL ( SELECT ts, price -- more columns? FROM prices WHERE currency = c.currency AND side = s.side ORDER BY ts DESC -- ts is NOT NULL LIMIT 1 ) p ORDER BY 1, 2; -- optional, whatever you prefer; _

_(currency, side, ts DESC)_のインデックスでvery高速インデックススキャンが表示されます。

インデックスのみのスキャンが可能で、tsとpriceのみが必要な場合は、priceをインデックスの最後の列として追加するのに費用がかかる場合があります。

dbfiddle ここ

このクエリをVIEWに保存するかどうかは、パフォーマンスに影響しません。

Evan Carroll · Answer

(currency, side, ts desc)にインデックスがある場合、次のような時間になります。

SELECT currency, side, max(ts) FROM prices GROUP BY currency, side;

それははるかに速くなりますが、私が個別に使用した理由は、最後のtsに関連付けられた価格の値を取得するためでした。 – 56分前のivarec