3つのテーブルがあります:units
(アパートメントユニット)、leases
、line_items
(より良い名前は定期的な家賃だったでしょう)。ユニットには多くのリースがあり、リースには多くのラインアイテムがあります(各家賃変更に関連付けられています)。私の目標は、一連の月が与えられた場合、各ユニットをその最新の家賃とともにリストすることです。
Table public.units
Column | Type | Collation | Nullable | Default
----------------+-----------------------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('units_id_seq'::regclass)
number | character varying | | |
bedrooms | integer | | |
bathrooms | integer | | |
square_footage | integer | | |
community_id | integer | | |
building_id | integer | | |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
slug | character varying | | |
status | integer | | not null | 0
note | text | | |
half_bathrooms | integer | | |
display_number | character varying | | |
Indexes:
units_pkey PRIMARY KEY, btree (id)
index_units_on_building_id btree (building_id)
index_units_on_community_id btree (community_id)
index_units_on_slug btree (slug)
Foreign-key constraints:
fk_Rails_5850136a38 FOREIGN KEY (building_id) REFERENCES buildings(id)
fk_Rails_b860cf198b FOREIGN KEY (community_id) REFERENCES communities(id)
Referenced by:
TABLE leases CONSTRAINT fk_Rails_29210439a5 FOREIGN KEY (unit_id) REFERENCES units(id)
TABLE market_rents CONSTRAINT fk_Rails_7270410e47 FOREIGN KEY (unit_id) REFERENCES units(id)
TABLE incomes_unit_budgets CONSTRAINT fk_Rails_791330e8fe FOREIGN KEY (unit_id) REFERENCES units(id)
Table public.leases
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
---------------------+-----------------------------+-----------+----------+------------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('leases_id_seq'::regclass) | plain | |
start | date | | | | plain | |
unit_id | integer | | | | plain | |
created_at | timestamp without time zone | | not null | | plain | |
updated_at | timestamp without time zone | | not null | | plain | |
vacate | date | | | | plain | |
community_id | bigint | | not null | | plain | |
Indexes:
leases_pkey PRIMARY KEY, btree (id)
index_leases_on_community_id btree (community_id)
index_leases_on_unit_id btree (unit_id)
Foreign-key constraints:
fk_Rails_29210439a5 FOREIGN KEY (unit_id) REFERENCES units(id)
fk_Rails_46ad3e9d34 FOREIGN KEY (community_id) REFERENCES communities(id)
Referenced by:
TABLE line_items CONSTRAINT fk_Rails_02720bca5b FOREIGN KEY (lease_id) REFERENCES leases(id)
TABLE terms CONSTRAINT fk_Rails_49f65f38eb FOREIGN KEY (lease_id) REFERENCES leases(id)
Table public.line_items
Column | Type | Collation | Nullable | Default
-----------------+-----------------------------+-----------+----------+----------------------------------------
id | integer | | not null | nextval('line_items_id_seq'::regclass)
name | character varying | | |
lease_id | integer | | |
amount_cents | integer | | not null | 0
amount_currency | character varying | | not null | 'USD'::character varying
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
start | date | | |
expiration | date | | |
Indexes:
line_items_pkey PRIMARY KEY, btree (id)
index_line_items_on_lease_id btree (lease_id)
index_line_items_on_start btree (start)
Foreign-key constraints:
fk_Rails_02720bca5b FOREIGN KEY (lease_id) REFERENCES leases(id)
次のクエリは、私が望んでいることを正確に実行しますが、比較的小さなデータセット(〜1500ユニット、〜5000リース、〜15000ラインアイテム)では非常に低速(8秒)です。
with last_rent as (
SELECT line_items.* FROM (SELECT line_items.*, leases.unit_id, row_number() over (
partition by unit_id, date_trunc('month', line_items.start)
order by line_items.start desc
)
FROM line_items INNER JOIN leases ON leases.id = line_items.lease_id WHERE line_items.name = 'RNT' AND leases.community_id IN (X)) as line_items WHERE line_items.row_number = 1
), month_series as (
SELECT id as unit_id, community_id, bedrooms, bathrooms, generate_series( '2019-03-01', '2019-03-01'::date, interval '1 month' ) - interval '1 day' dt
FROM units WHERE units.community_id IN (X) ORDER BY units.id ASC
), unit_rent_month as (
select unit_id, community_id, bedrooms, bathrooms, amount_cents, date_trunc('month', dt)::date as period
from month_series
left join lateral (
select amount_cents
from last_rent
where
last_rent.unit_id = month_series.unit_id
and last_rent.start <= month_series.dt
order by unit_id, last_rent.start desc
limit 1
) last_rent on true
order by dt, unit_id
)
select *
from unit_rent_month;
私の戦略は、ウィンドウ関数を使用して、毎月の最後の家賃(早期退去またはキャンセルの場合)を分離することでした。すべてのunit_idと月をリストする別のテーブル(month_series)を作成しました。前回の家賃と前月のシリーズを横結合してみました。
analyze
は次のようになります。
CTE Scan on unit_rent_month (cost=2563655.29..2593175.29 rows=1476000 width=24) (actual time=8019.924..8020.607 rows=1476 loops=1)
Output: unit_rent_month.unit_id, unit_rent_month.community_id, unit_rent_month.bedrooms, unit_rent_month.bathrooms, unit_rent_month.amount_cents, unit_rent_month.period
Buffers: shared hit=990
CTE last_rent
-> Subquery Scan on line_items (cost=1682.58..2163.78 rows=60 width=56) (actual time=44.584..63.921 rows=12010 loops=1)
Output: line_items.id, line_items.name, line_items.lease_id, line_items.amount_cents, line_items.amount_currency, line_items.created_at, line_items.updated_at, line_items.start, line_items.expiration, line_items.unit_id, line_items.row_number
Filter: (line_items.row_number = 1)
Rows Removed by Filter: 63
Buffers: shared hit=269
-> WindowAgg (cost=1682.58..2013.40 rows=12030 width=68) (actual time=44.582..60.093 rows=12073 loops=1)
Output: line_items_1.id, line_items_1.name, line_items_1.lease_id, line_items_1.amount_cents, line_items_1.amount_currency, line_items_1.created_at, line_items_1.updated_at, line_items_1.start, line_items_1.expiration, leases.unit_id, row_number() OVER (?), leases.unit_id, (date_trunc('month'::text, (line_items_1.start)::timestamp with time zone))
Buffers: shared hit=269
-> Sort (cost=1682.58..1712.65 rows=12030 width=56) (actual time=44.524..45.478 rows=12073 loops=1)
Output: line_items_1.start, leases.unit_id, (date_trunc('month'::text, (line_items_1.start)::timestamp with time zone)), line_items_1.id, line_items_1.name, line_items_1.lease_id, line_items_1.amount_cents, line_items_1.amount_currency, line_items_1.created_at, line_items_1.updated_at, line_items_1.expiration
Sort Key: leases.unit_id, (date_trunc('month'::text, (line_items_1.start)::timestamp with time zone)), line_items_1.start DESC
Sort Method: quicksort Memory: 2082kB
Buffers: shared hit=269
-> Hash Join (cost=378.43..867.28 rows=12030 width=56) (actual time=7.626..33.901 rows=12073 loops=1)
Output: line_items_1.start, leases.unit_id, date_trunc('month'::text, (line_items_1.start)::timestamp with time zone), line_items_1.id, line_items_1.name, line_items_1.lease_id, line_items_1.amount_cents, line_items_1.amount_currency, line_items_1.created_at, line_items_1.updated_at, line_items_1.expiration
Inner Unique: true
Hash Cond: (line_items_1.lease_id = leases.id)
Buffers: shared hit=269
-> Seq Scan on public.line_items line_items_1 (cost=0.00..395.25 rows=12736 width=44) (actual time=0.012..5.813 rows=12736 loops=1)
Output: line_items_1.id, line_items_1.name, line_items_1.lease_id, line_items_1.amount_cents, line_items_1.amount_currency, line_items_1.created_at, line_items_1.updated_at, line_items_1.start, line_items_1.expiration
Filter: ((line_items_1.name)::text = 'RNT'::text)
Rows Removed by Filter: 4884
Buffers: shared hit=175
-> Hash (cost=292.99..292.99 rows=6835 width=8) (actual time=7.537..7.537 rows=6837 loops=1)
Output: leases.unit_id, leases.id
Buckets: 8192 Batches: 1 Memory Usage: 332kB
Buffers: shared hit=94
-> Seq Scan on public.leases (cost=0.00..292.99 rows=6835 width=8) (actual time=0.010..4.954 rows=6837 loops=1)
Output: leases.unit_id, leases.id
Filter: (leases.community_id = ANY ('{X}'::bigint[]))
Rows Removed by Filter: 401
Buffers: shared hit=94
CTE month_series
-> Result (cost=0.28..33371.24 rows=1476000 width=24) (actual time=0.065..11.553 rows=1476 loops=1)
Output: units.id, units.community_id, units.bedrooms, units.bathrooms, ((generate_series('2019-03-01 00:00:00-08'::timestamp with time zone, ('2019-03-01'::date)::timestamp with time zone, '1 mon'::interval)) - '1 day'::interval)
Buffers: shared hit=712
-> ProjectSet (cost=0.28..7541.24 rows=1476000 width=24) (actual time=0.062..9.869 rows=1476 loops=1)
Output: generate_series('2019-03-01 00:00:00-08'::timestamp with time zone, ('2019-03-01'::date)::timestamp with time zone, '1 mon'::interval), units.id, units.community_id, units.bedrooms, units.bathrooms
Buffers: shared hit=712
-> Index Scan using units_pkey on public.units (cost=0.28..146.48 rows=1476 width=16) (actual time=0.038..4.365 rows=1476 loops=1)
Output: units.id, units.number, units.bedrooms, units.bathrooms, units.square_footage, units.community_id, units.building_id, units.created_at, units.updated_at, units.slug, units.status, units.note, units.half_bathrooms, units.display_number
Filter: (units.community_id = ANY ('{X}'::integer[]))
Rows Removed by Filter: 201
Buffers: shared hit=712
CTE unit_rent_month
-> Sort (cost=2524430.27..2528120.27 rows=1476000 width=32) (actual time=8019.918..8019.998 rows=1476 loops=1)
Output: month_series.unit_id, month_series.community_id, month_series.bedrooms, month_series.bathrooms, last_rent.amount_cents, ((date_trunc('month'::text, month_series.dt))::date), month_series.dt
Sort Key: month_series.dt, month_series.unit_id
Sort Method: quicksort Memory: 164kB
Buffers: shared hit=990
-> Nested Loop Left Join (cost=1.51..2302560.00 rows=1476000 width=32) (actual time=78.693..8018.611 rows=1476 loops=1)
Output: month_series.unit_id, month_series.community_id, month_series.bedrooms, month_series.bathrooms, last_rent.amount_cents, (date_trunc('month'::text, month_series.dt))::date, month_series.dt
Buffers: shared hit=984
-> CTE Scan on month_series (cost=0.00..29520.00 rows=1476000 width=24) (actual time=0.067..13.332 rows=1476 loops=1)
Output: month_series.unit_id, month_series.community_id, month_series.bedrooms, month_series.bathrooms, month_series.dt
Buffers: shared hit=712
-> Limit (cost=1.51..1.51 rows=1 width=12) (actual time=5.419..5.420 rows=1 loops=1476)
Output: last_rent.amount_cents, last_rent.unit_id, last_rent.start
Buffers: shared hit=272
-> Sort (cost=1.51..1.51 rows=1 width=12) (actual time=5.418..5.418 rows=1 loops=1476)
Output: last_rent.amount_cents, last_rent.unit_id, last_rent.start
Sort Key: last_rent.start DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=272
-> CTE Scan on last_rent (cost=0.00..1.50 rows=1 width=12) (actual time=3.395..5.411 rows=8 loops=1476)
Output: last_rent.amount_cents, last_rent.unit_id, last_rent.start
Filter: ((last_rent.start <= month_series.dt) AND (last_rent.unit_id = month_series.unit_id))
Rows Removed by Filter: 12002
Buffers: shared hit=269
Planning time: 3.219 ms
Execution time: 8021.388 ms
analyze
の発言に基づくと、問題はサブクエリとラテラル結合にあると思います。誰かがこれを最適化する方法を理解するのを手伝ってくれる?必要に応じて、より多くの情報を共有してください。
編集:これが私の最後のクエリです:
SELECT id as unit_id, community_id, bedrooms, bathrooms, date_trunc('month', mon.dt)::date AS period, last_rent.amount_cents
FROM "units"
CROSS JOIN generate_series(timestamp '2018-01-01'
, timestamp '2018-12-01'
, interval '1 month') mon(dt)
left join lateral (
SELECT li.amount_cents
FROM leases l
JOIN line_items li ON l.id = li.lease_id
WHERE l."unit_id" = "units"."id"
AND li.start < mon.dt + interval '1 month'
AND li.name = 'RNT'
order by li.start desc
limit 1
) last_rent on true
WHERE "units"."community_id" IN (X)
order by period, unit_id
編集2:
参考までに、行と必要な出力を含むテストケースを次に示します。
Line Item 1:
id: 1
name: "RNT"
lease_id: 1
amount_cents: 100,000
start: January 1, 2018
Line Item 2:
id: 2
name: "RNT"
lease_id: 2
amount_cents: 110,000
start: March 5, 2018
Lease 1:
id: 1
start: January 1, 2018
vacate: March 4, 2018
unit_id: 1
Lease 2:
id: 2
start: March 5, 2018
vacate: null
unit_id: 1
Unit 1:
id: 1
community_id: 1
number: 101
bedrooms: 1
bathrooms: 1
出力:
| unit_id | community_id | bedrooms | bathrooms | period | amount_cents
---------------------------------------------------------------------------
| 1 | 1 | 1 | 1 | Jan 1, 2018 | 100,0000
| 1 | 1 | 1 | 1 | Feb 1, 2018 | 100,0000
| 1 | 1 | 1 | 1 | Mar 1, 2018 | 110,0000
| 1 | 1 | 1 | 1 | Apr 1, 2018 | 110,0000
... Repeating 110,000 for the rest of 2018.
クエリに多くの冗長な作業があります。最初のCTE _last_rent
_は、テーブル全体のeveryユニットの月ごとの「最初の」行を取得し、結果を実体化しますが、その一部のみが後でCTE _unit_rent_month
_。さらに、そこで作業を繰り返します。
基本的に、最初のCTE _last_rent
_を完全に削除できます。そして、その間に、allCTEを削除します。実際に必要なのは、単一のLATERAL
サブクエリだけです。この根本的に書き直されたクエリは、同じことをはるかに高速に行う必要があります(テストされていません)。
_SELECT u.id AS unit_id, u.community_id, u.bedrooms, u.bathrooms
, last_rent.amount_cents
, date_trunc('month', mon.dt)::date AS period
FROM units u
CROSS JOIN generate_series(timestamp '2019-02-01' -- switched to lower bound
, timestamp '2019-02-01' -- and see below
, interval '1 month') mon(dt)
LEFT JOIN LATERAL (
SELECT li.amount_cents
FROM leases l
JOIN line_items li ON l.id = li.lease_id
WHERE l.unit_id = u.id
AND l.community_id = u.community_id -- redundant?
AND li.name = 'RNT'
AND li.start < mon.dt + interval '1 month'
AND li.start >= mon.dt -- also required, see below
ORDER BY li.start DESC -- NULLS LAST ?
LIMIT 1
) last_rent ON true
WHERE u.community_id IN (X) -- ?
ORDER BY mon.dt, u.id;
_
この例では、1か月しか生成されません。意図は明らかに、オプションで一度に数ヶ月連続して生成することです。私はgenerate_series()
式を微妙に変更しました-これが理由です:
また、更新されたクエリで行ったように下限を指定するように切り替えました。それは少し便利です。
条件community_id IN (X)
を繰り返した場合、units
とleases
がdifferent_community_id
_に一致する可能性があります。 IN句に複数の式があります。 AND _l.community_id = u.community_id
_と完全に一致するようにそれを引き締めました。それが希望どおりでない場合は適応します。列_leases.community_id
_はそもそも冗長であるようですが、この述語を完全に削除できます。
この述語は論理的な違いをもたらします。
_ AND li.start >= mon.dt
_
更新されたクエリでドロップしましたが、結果行の月の最新の_amount_cents
_がない場合は常にNULL
を取得するように保持する必要があるかもしれません。それはあなたの元のクエリがしたことです。
ドロップすると、指定した日付範囲内で、おそらく前月からの最新の_amount_cents
_を取得します。つまり日付範囲の下限を変更すると、月ごとに異なる結果を得ることができます。
関連:
非公開の詳細によっては、異なるクエリスタイルを使用するか、1つ以上の multicolumn または partial index を追加することで、より高速な結果が得られる場合があります。暗闇でのショット、これmightは非常に高速なインデックスのみのスキャンを取得します。
_CREATE INDEX foo ON leases (unit_id, id);
CREATE INDEX bar ON line_items (lease_id, start DESC, amount_cents)
WHERE name = 'RNT';
_
または、少し良いのは、Postgres11の場合:
_CREATE INDEX bar ON line_items (lease_id, start DESC) INCLUDING (amount_cents)
WHERE name = 'RNT';
_
_WHERE name = 'RNT'
_は通常、「RNT」があまり一般的でない場合にのみ支払います。
関連(そこで言及されている前提条件を検討してください):