PostgreSQL 11を使用して、約4億5000万行の次のテーブルを作成しました。
postgres=> \d+ sales
Table "public.sales"
Column | Type | Modifiers | Storage | Stats target | Description
-----------------------------------+-----------------------------+------------------------------------+----------+--------------+-------------
created_terminal_id | integer | not null | plain | |
company_id | integer | not null | plain | |
customer_id | integer | | plain | |
sale_no | character varying(20) | not null | extended | |
sale_type | smallint | not null | plain | |
source_type | smallint | not null | plain | |
sale_date | timestamp without time zone | not null | plain | |
paid_amount | numeric(18,4) | not null default 0.0000 | main | |
change_amount | numeric(18,4) | not null default 0.0000 | main | |
cashup_id | integer | | plain | |
staff_id | integer | | plain | |
payment_terminal_id | integer | not null | plain | |
site_id | integer | not null | plain | |
sale_id | integer | not null | plain | |
deleted | smallint | default 0 | plain | |
is_tax_on | smallint | not null default 1 | plain | |
props | text | not null | extended | |
modified_time | timestamp without time zone | not null default CURRENT_TIMESTAMP | plain | |
sum_line_variation_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_variation_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_quantified_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_quantified_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_subtotal_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_subtotal_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_total_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_total_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_cost_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_line_cost_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | |
sum_payment_tip_price | numeric(18,4) | not null default 0.0000 | main | |
order_variation_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | |
order_variation_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | |
order_total_ex_tax_price | numeric(18,4) | not null default 0.0000 | main | |
order_total_inc_tax_price | numeric(18,4) | not null default 0.0000 | main | |
order_tip_price | numeric(18,4) | not null default 0.0000 | main | |
order_variation_is_percent | smallint | not null default 0 | plain | |
order_variation_percent | numeric(18,4) | not null default 1.0000 | main | |
order_type | smallint | | plain | |
order_tip_is_percent | smallint | not null default 0 | plain | |
order_tip_percent | numeric(18,4) | not null default 1.0000 | main | |
sale_date_id | integer | not null default 0 | plain | |
voided_sale_id | integer | | plain | |
voided_sale_date | timestamp without time zone | | plain | |
sale_date_utc | timestamp without time zone | | plain | |
foo | numeric(18,4) | | main | |
bar | numeric(18,4) | not null default 0 | main | |
Indexes:
"sales_pkey" PRIMARY KEY, btree (sale_id)
"idx_unique_sale" UNIQUE CONSTRAINT, btree (created_terminal_id, sale_date, sale_no)
"idx_sale_cashup_id" btree (cashup_id)
"idx_sale_customer_id" btree (customer_id)
"idx_sale_modified_time" btree (modified_time)
"idx_sale_payment_terminal_id" btree (payment_terminal_id)
"idx_sale_site_date" btree (sale_date)
"idx_sale_site_id" btree (site_id)
"idx_sale_staff_id" btree (staff_id)
"sales_company_id" btree (company_id)
"sales_sale_date_id" btree (sale_date_id)
Has OIDs: no
次のクエリの実行には約35分かかります。
postgres=> EXPLAIN
postgres-> SELECT sales.sale_id as numeric_id, sales.site_id, sales.created_terminal_id as terminal_id, sales.props as props, sales.order_total_inc_tax_price as SaleAmount, sales.staff_id as staff_id, sales.paid_amount as PaidAmount, (sales.order_total_inc_tax_price - sales.order_total_ex_tax_price) as taxAmount, sales.sale_date as SaleDate, sales.sale_no as SaleNo, sales.voided_sale_id as LinkedSaleNo, sales.voided_sale_date as LinkedSaleDate, sales.sale_type as sale_type, sales.deleted
postgres-> FROM sales
postgres-> WHERE sales.deleted = 0
postgres-> AND sales.site_id = 72620
postgres-> AND order_total_inc_tax_price < 40
postgres-> AND sale_date > '2019-03-08'
postgres-> ORDER BY sale_id DESC
postgres-> LIMIT 50;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------
Limit (cost=1000.60..76779.68 rows=50 width=189)
-> Gather Merge (cost=1000.60..29806430.01 rows=19666 width=189)
Workers Planned: 2
-> Parallel Index Scan Backward using sales_pkey on sales (cost=0.57..29803160.04 rows=8194 width=189)
Filter: ((order_total_inc_tax_price < '40'::numeric) AND (sale_date > '2019-03-08 00:00:00'::timestamp without time zone) AND (del
eted = 0) AND (site_id = 72620))
(5 rows)
ただし、ORDER BY sale_id DESC NULLS LAST
根本的に異なる計画と大幅な速度向上が得られます(クエリは数秒で完了します):
postgres=> EXPLAIN
postgres-> SELECT sales.sale_id as numeric_id, sales.site_id, sales.created_terminal_id as terminal_id, sales.props as props, sales.order_total_inc_tax_price as SaleAmount, sales.staff_id as staff_id, sales.paid_amount as PaidAmount, (sales.order_total_inc_tax_price - sales.order_total_ex_tax_price) as taxAmount, sales.sale_date as SaleDate, sales.sale_no as SaleNo, sales.voided_sale_id as LinkedSaleNo, sales.voided_sale_date as LinkedSaleDate, sales.sale_type as sale_type, sales.deleted
postgres-> FROM sales
postgres-> WHERE sales.deleted = 0
postgres-> AND sales.site_id = 72620
postgres-> AND order_total_inc_tax_price < 40
postgres-> AND sale_date > '2019-03-08'
postgres-> ORDER BY sale_id DESC NULLS LAST
postgres-> LIMIT 50;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
-----------
Limit (cost=216622.14..216622.26 rows=50 width=189)
-> Sort (cost=216622.14..216671.30 rows=19666 width=189)
Sort Key: sale_id DESC NULLS LAST
-> Index Scan using idx_sale_site_id on sales (cost=0.57..215968.85 rows=19666 width=189)
Index Cond: (site_id = 72620)
Filter: ((order_total_inc_tax_price < '40'::numeric) AND (sale_date > '2019-03-08 00:00:00'::timestamp without time zone) AND (del
eted = 0))
(6 rows)
NULLを含むことができないPRIMARY KEYでソートしているのに、クエリプランナーがこの選択をするのはなぜですか?
このサーバーはベンチマーク用であり、データベースに他の負荷はないことは注目に値します。また、データのロード後にANALYZE
が実行されました。
インデックスがORDER BY
で指定されたのと同じ順序である場合、インデックスを使用して並べ替えずにORDER BY
句を処理することができます。
これで、インデックスは(デフォルトで)ASC NULLS LAST
でソートされ、インデックスは両方向でスキャンできるため、ORDER BY sale_id ASC NULLS LAST
とORDER BY sale_id DESC NULLS FIRST
の両方をサポートできます。ただし、順序が異なるため、ORDER BY sale_id DESC NULLS LAST
をサポートできません。
プランナーは、列定義のNOT NULL
を考慮しません。これは、build_index_pathkeys
のsrc/backend/optimizer/path/pathkeys.c
で決定されます。
if (ScanDirectionIsBackward(scandir))
{
reverse_sort = !index->reverse_sort[i];
nulls_first = !index->nulls_first[i];
}
else
{
reverse_sort = index->reverse_sort[i];
nulls_first = index->nulls_first[i];
}
/*
* OK, try to make a canonical pathkey for this sort key. Note we're
* underneath any outer joins, so nullable_relids should be NULL.
*/
cpathkey = make_pathkey_from_sortinfo(root,
indexkey,
NULL,
index->sortopfamily[i],
index->opcintype[i],
index->indexcollations[i],
reverse_sort,
nulls_first,
0,
index->rel->relids,
false);
ここでnull可能性を考慮するのがどれほど簡単かはわかりませんが、現時点では行われていません。
おそらく、pgsql-hackersメーリングリストにそれを提案できます。