DBへのマテリアライズドビューの影響の更新

Question

こんにちは、Amazon RDSでm4.large（2cpu 8gb）とプロビジョニングされたIOPSが1000のPostgreSQL 9.6データベースを実行しています。使用例は次のとおりです。このテーブルのサブセット（2M近似）を使用してマテリアライズドビューを作成し、一部の列タイプをクエリでより効率的になるように変更しました。 pg_confは変更されていません。RDSPostgresのデフォルトです。

これがビューの定義です。

CREATE MATERIALIZED VIEW public.customers_mv as SELECT id, gender, contact_info, location, social, categories, (social ->> 'follower_count')::integer AS social_follower_count, (social ->> 'following_count')::integer AS social_following_count, (social ->> 'peemv')::float AS social_emv, (social ->> 'engagement')::float AS social_engagement, (social ->> 'v')::boolean AS social_validated, search_vector, flags, to_tsvector('english',concat_ws(' ','aal0_'||(customers.location ->> 'aal0'), 'aal1_'||(customers.location ->> 'aal1'), 'aal2_'||(customers.location ->> 'aal2'), 'frequent_location_aal0_'||(customers.location -> 'frequent_location' ->> 'aal0'), 'frequent_location_aal1_'||(customers.location -> 'frequent_location' ->> 'aal1'), 'frequent_location_aal2_'||(customers.location -> 'frequent_location' ->> 'aal2'), 'last_post_location_aal0_'||(customers.location -> 'last_post_location' ->> 'aal0'), 'last_post_location_aal1_'||(customers.location -> 'last_post_location' ->> 'aal1'), 'last_post_location_aal2_'||(customers.location -> 'last_post_location' ->> 'aal2'), 'admin_location_aal0_'||(customers.location -> 'admin_location' ->> 'aal0'), 'bio_location_aal0_'||(customers.location -> 'bio_location' ->> 'aal0'))) as loc_vector FROM public.customers WHERE (customers.social -> 'follower_count') > '5000' AND customers.social ? 'last_posts' AND (customers.flags IS NULL OR NOT customers.flags @> '{"destroy": true}'::jsonb); CREATE INDEX customers_mv_followerc_idx ON customers_mv USING BTREE (social_follower_count); CREATE INDEX customers_mv_folling_idx ON customers_mv USING BTREE (social_following_count); CREATE INDEX customers_mv_emv_idx ON customers_mv USING BTREE (social_emv); CREATE INDEX customers_mv_gin_social_idx ON customers_mv USING GIN (social jsonb_path_ops); CREATE INDEX customers_mv_partial_social_validated_idx ON customers_mv (social_validated) WHERE social_validated = FALSE; CREATE INDEX customers_mv_categories_idx ON customers_mv USING gin (categories); CREATE INDEX customers_mv_gin_location_idx ON customers_mv USING GIN (location jsonb_path_ops); CREATE INDEX customers_mv_gin_loc_vector ON customers_mv USING gin(loc_vector); CREATE UNIQUE INDEX customers_mv_uniq_id_idx ON customers_mv (id);

ビューには、locationまたはsocial（両方のjsonbタイプ）などのデータにアクセスするための列と、loc_vectorクエリを高速化します。

今問題：この問題は、CPUで更新コマンドを起動しようとすると、更新、同時更新、または新しいマテリアライズドビューを作成しようとしたときに発生します。または、書き込みIOPSがDBをクラッシュさせます。

ここでは、それがDBにどれほど激しくヒットするかを確認できます

私たちの要件を考えると、理想的な設定は何ですか？

インフラストラクチャの増加の問題か、構成/最適化の問題か？

jlovison · Answer

問題はおそらくインデックスです。

IIRC、マテリアライズドビューを更新すると、既存のデータが削除され、現在のデータで新しい「テーブル」が作成されます。

これがインデックスに対して行うことは、インデックスに基づいて重要なワークロードのように送信されるデータのサブセット全体のインデックスを再作成することです。

私の推奨事項は、組み込みのマテリアライズドビューを使用せず、代わりに独自の同等のロールバックを使用してデータを段階的に更新し、古いデータ/インデックスのみを書き込むことです。

トレードオフは、これは単純にマテリアライズドビューを作成するよりも構築がはるかに複雑になることであり、古いデータの無効化を追跡する方法を検討する必要があります。