異なるアイテムごとに最新の日付の行を取得する

Question

これが2つのテーブルの結合から取得されたサンプル日付であるとしましょう。データベースはPostgres 9.6です

_id product_id invoice_id amount date 1 PROD1 INV01 2 01-01-2018 2 PROD2 INV02 3 01-01-2018 3 PROD1 INV01 2 05-01-2018 4 PROD1 INV03 1 05-01-2018 5 PROD2 INV02 3 08-01-2018 6 PROD2 INV04 4 08-01-2018 _

私はそれが最適化された方法で可能かどうか知りたいです：

最新の日付を持つ、それぞれのINVxを持つすべてのPRODxを取得しますが、product_idごとです。ある日から未使用のレコードが新しいレコードに報告される場合があることに注意してください。これの意味は：

_id product_id invoice_id amount date 3 PROD1 INV01 2 05-01-2018 4 PROD1 INV03 1 05-01-2018 5 PROD2 INV02 3 08-01-2018 6 PROD2 INV04 4 08-01-2018 _

各PRODxの毎日の合計金額を取得しますが、日が存在しない場合は、前のギャップを埋めます。

これの意味は：

_ product_id amount date PROD1 2 01-01-2018 PROD2 3 01-01-2018 PROD1 2 02-01-2018 PROD2 3 02-01-2018 PROD1 2 03-01-2018 PROD2 3 03-01-2018 PROD1 2 04-01-2018 PROD2 3 04-01-2018 PROD1 3 05-01-2018 PROD2 3 05-01-2018 PROD1 3 06-01-2018 PROD2 3 06-01-2018 PROD1 3 07-01-2018 PROD2 3 07-01-2018 PROD1 3 08-01-2018 PROD2 7 08-01-2018 _

いくつかの考え：

最初の質問では、各PRODxのmax(date)を取得し、各PRODxのdate=with max(date)を含む行を選択することができましたが、これを取得するためのより高速な方法があるかどうか疑問に思っていましたデータベース内の推奨事項
2番目の質問では、必要な間隔で一連の日付を生成し、_WITH rows As_を使用して、_product_id_およびsumでクエリをグループ化し、日付ごとに選択できます。 rowsの以前の値に_limit 1_を付けたものですが、最適化されたようには聞こえません。

入力を楽しみにしています。ありがとうございました。

後で編集：DISTINCT ON（）を試してみます。

distinct on(product_id, invoice_id)がある場合、最新の日付の最新のものだけを取得するわけではありません。過去に最新の日付の横にinvoice_idがあった場合、それらが返されます
distinct on (product_id)がある場合、それは最新の日付から返されますが、通常、最後の行にPROD1の2つのポジションがある場合でも、最後の行のみが返されます。

基本的に、「product_idには複数のinvoice_idが含まれる可能性があることに留意しながら、最新の日付、すべてのproduct_idとそのinvoice_idが必要です」のようなものが必要です

後で編集2：

最初の質問のようなクエリを実行すると、かなり高速に見えます：

_select product_id, invoice_id, amount from mytable inner join myOtherTable on... inner join (select max(date) as last_date, product_id from mytable group by product_id) sub on mytable.date = sub.last_date _

amacvar · Accepted Answer

Q＃1のスキンを@ypercubeとは独立して、わずかに異なります

with cte as (select row_number() over (partition by product_id, invoice_id order by dt desc) as rn, product_id, invoice_id, amount,dt from product ) select product_id, invoice_id,amount,dt from cte where rn=1 order by product_id,invoice_id; product_id | invoice_id | amount | dt ------------+------------+--------+------------ PROD1 | INV01 | 2 | 2018-01-05 PROD1 | INV03 | 1 | 2018-01-05 PROD2 | INV02 | 3 | 2018-01-08 PROD2 | INV04 | 4 | 2018-01-08 (4 rows)

Q＃2の場合、順調ですが、SQLにはクロス結合があります（gasp！）

ループ/カーソルのある関数はより最適化されると思います（次の空き時間にそれを試します）

--the cte will give us the real values with cte as (select product_id, sum(amount) as amount, dt from product group by product_id,dt) select p.product_id, (select cte.amount --choose the amount from cte where cte.product_id = p.product_id and cte.dt <= d.gdt -- for same day or earlier order by cte.dt desc limit 1) as finamt, d.gdt from (select generate_series( (select min(dt) from product), --where clause if some products --don't have an amount (select max(dt) from product), '1 day' )::date as gdt) d cross join --assuming each listed product has an amount on the min date (select distinct product_id from product) p left join --since we need to fill the gaps cte on ( d.gdt = cte.dt and p.product_id = cte.product_id) order by d.gdt, p.product_id ;

ypercubeᵀᴹ · Answer

すべての商品の最新の日付のすべての行が必要であるとのことですが（同封、つまり、最後の日付のすべての行）。これはrank()関数で行うことができます：

select id, product_id, invoice_id, amount, date from ( select id, product_id, invoice_id, amount, date, rank() over (partition by product_id order by date desc) as rnk from -- your joins ) as t where rnk = 1 ;

user166779 · Answer

後で編集する方法に同意します。

select product_id, invoice_id, amount from mytable inner join (select max(date) as last_date, product_id, invoice_id from mytable group by product_id) sub on mytable.date = sub.last_date and mytable.product_id = sub.product_id and mytable.invoice_id = sub.invoice_id;

「キー」はdate、product_idおよびinvoice_id。