クエリ結果が期待どおりではない-JsonbPostgres

Question

次のテーブルを作成しました。

CREATE TABLE public.influencers ( id integer NOT NULL DEFAULT nextval('influencers_id_seq'::regclass), location jsonb, gender text COLLATE pg_catalog."default", birthdate timestamp without time zone, ig jsonb, contact_info jsonb, created_at timestamp without time zone DEFAULT now(), updated_at timestamp without time zone DEFAULT now(), categories text[] COLLATE pg_catalog."default", search_field text COLLATE pg_catalog."default", search_vector tsvector, ig_updated_at timestamp without time zone, CONSTRAINT influencers_pkey PRIMARY KEY (id), CONSTRAINT ig_id_must_exist CHECK (ig ? 'id'::text), CONSTRAINT ig_username_must_exist CHECK (ig ? 'username'::text) )

そして、整数を含むig内のフィールドをフィルタリングするために、次のインデックスを作成しました。これは私たちが作成したインデックスです：

CREATE INDEX idx_btree_ig_follower_count ON public.influencers USING BTREE ((ig->>'follower_count'));

次のクエリは427000の結果を返すはずです。代わりに、何が返されるかとは何の関係もない乱数を返します。これはクエリです：

SELECT count(*) FROM "influencers" WHERE ((ig ->> 'follower_count') >= '1000') AND ((ig ->> 'follower_count') <= '10000') AND (ig->>'follower_count') IS NOT NULL

整数ではなく文字列（またはテキスト）を比較していると思われるため、理解できない結果が返されます。

ただし、機能する（期待される行を返す）が、idx_btree_ig_follower_countインデックスを使用しないクエリは次のとおりです。

SELECT count(*) FROM "influencers" WHERE ((ig -> 'follower_count') >= '1000') AND ((ig -> 'follower_count') <= '10000') AND (ig->'follower_count') IS NOT NULL;

効率的にクエリを実行できるようにするには、どのようにしてクエリまたはインデックスを作成できますか？

Evan Carroll · Answer

これを試してみてください..好奇心から..

CREATE INDEX idx_btree_ig_follower_count ON public.influencers USING BTREE (CAST(ig->>'follower_count' AS int));

次に、クエリでintにキャストします。

SELECT count(*) FROM "influencers" WHERE ((ig ->> 'follower_count')::int >= '1000') AND ((ig ->> 'follower_count')::int <= '10000');

IS NOT NULLおよび>=を指定する場合は、<=を指定する必要はありません。

テストケース

1000万行のjsonbを使用して、btreeインデックスを使用してテストデータを作成しました。

CREATE TABLE influences AS SELECT jsonb_build_object('follower_count', trunc(random()*1e7)::int) AS ig FROM generate_series(1,1e7); CREATE INDEX ON influences( CAST( (ig->>'follower_count') AS int) );

提供したものと同じクエリプランを取得しています。あなたはここで私のものを見ることができます。

EXPLAIN (ANALYZE,BUFFERS) SELECT count(*) FROM influences WHERE (ig->>'follower_count')::int BETWEEN 0 AND 854780; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=130124.36..130124.37 rows=1 width=0) (actual time=482.615..482.615 rows=1 loops=1) Buffers: shared hit=85674 -> Bitmap Heap Scan on influences (cost=18483.87..127946.94 rows=870969 width=0) (actual time=206.182..426.880 rows=855570 loops=1) Recheck Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780)) Heap Blocks: exact=83334 Buffers: shared hit=85674 -> Bitmap Index Scan on influences_int4_idx (cost=0.00..18266.13 rows=870969 width=0) (actual time=182.158..182.158 rows=855570 loops=1) Index Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780)) Buffers: shared hit=2340 Planning time: 0.341 ms Execution time: 482.665 ms (11 rows)

私ははるかに多くの行を集約しています。私の例では855570であり、10倍以上高速です。

私はそれをクラスター化しました

CLUSTER influences USING influences_int4_idx ;

そして、それはさらに200msに時間を半分にしました

EXPLAIN (ANALYZE,BUFFERS) SELECT count(*) FROM influences WHERE (ig->>'follower_count')::int BETWEEN 0 AND 854780; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=130124.66..130124.67 rows=1 width=0) (actual time=211.880..211.881 rows=1 loops=1) Buffers: shared hit=4 read=9466 -> Bitmap Heap Scan on influences (cost=18483.94..127947.22 rows=870976 width=0) (actual time=69.503..155.670 rows=855570 loops=1) Recheck Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780)) Heap Blocks: exact=7130 Buffers: shared hit=4 read=9466 -> Bitmap Index Scan on influences_int4_idx (cost=0.00..18266.20 rows=870976 width=0) (actual time=68.150..68.150 rows=855570 loops=1) Index Cond: ((((ig ->> 'follower_count'::text))::integer >= 0) AND (((ig ->> 'follower_count'::text))::integer <= 854780)) Buffers: shared hit=3 read=2337 Planning time: 0.412 ms Execution time: 211.929 ms (11 rows)