私は、8億5,000万行を超えるデータでテーブルを更新する更新クエリの作成を任されています。以下はテーブル構造です。
ソーステーブル:
_ CREATE TABLE [dbo].[SourceTable1](
[ProdClassID] [varchar](10) NOT NULL,
[PriceListDate] [varchar](8) NOT NULL,
[PriceListVersion] [smallint] NOT NULL,
[MarketID] [varchar](10) NOT NULL,
[ModelID] [varchar](20) NOT NULL,
[VariantId] [varchar](20) NOT NULL,
[VariantType] [tinyint] NULL,
[Visibility] [tinyint] NULL,
CONSTRAINT [PK_SourceTable1] PRIMARY KEY CLUSTERED
(
[VariantId] ASC,
[ModelID] ASC,
[MarketID] ASC,
[ProdClassID] ASC,
[PriceListDate] ASC,
[PriceListVersion] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90)
)
CREATE TABLE [dbo].[SourceTable2](
[Id] [uniqueidentifier] NOT NULL,
[ProdClassID] [varchar](10) NULL,
[PriceListDate] [varchar](8) NULL,
[PriceListVersion] [smallint] NULL,
[MarketID] [varchar](10) NULL,
[ModelID] [varchar](20) NULL,
CONSTRAINT [PK_SourceTable2] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 91) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
_
_SourceTable1
_には5200万行のデータが含まれ、_SourceTable2
_には400,000行のデータが含まれます。
これはTargetTable
構造です
_CREATE TABLE [dbo].[TargetTable](
[ChassisSpecificationId] [uniqueidentifier] NOT NULL,
[VariantId] [varchar](20) NOT NULL,
[VariantType] [tinyint] NULL,
[Visibility] [tinyint] NULL,
CONSTRAINT [PK_TargetTable] PRIMARY KEY CLUSTERED
(
[ChassisSpecificationId] ASC,
[VariantId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 71) ON [PRIMARY]
) ON [PRIMARY]
_
これらのテーブル間の関係は次のとおりです。
SourceTable1.VariantID
_は_TargetTable.VariantID
_に関連していますSourceTable2.ID
_は_TargetTable.ChassisSpecificationId
_に関連しています更新要件は次のとおりです。
VariantType
列に最大値を持つ各Visibility
の_SourceTable1
_からVariantID
およびPriceListVersion
の値を取得します。SourceTable2
_からID
列の値を取得します。ここで、ModelID
、ProdClassID
、PriceListDate
およびMarketID
の値は一致します_SourceTable1
_の場合。TargetTable
をVariantType
およびVisibility
の値で更新します。ここで、ChassisspecificationID
は_SourceTable2.ID
_と一致し、VariantID
は_SourceTable1.VariantID
_課題は、最小限のロックで、ライブアップデートでこのアップデートを行うことです。これが私がまとめたクエリです。
_-- Check if Temp table already exists and drop if it does
IF EXISTS(
SELECT NULL
FROM tempdb.sys.tables
WHERE name LIKE '#CSpec%'
)
BEGIN
DROP TABLE #CSpec;
END;
-- Create Temp table to assign sequence numbers
CREATE Table #CSpec
(
RowID int,
ID uniqueidentifier,
PriceListDate VarChar(8),
ProdClassID VarChar(10),
ModelID VarChar(20),
MarketID Varchar(10)
);
-- Populate temp table
INSERT INTO #CSpec
SELECT ROW_NUMBER() OVER (ORDER BY MarketID) RowID,
CS.id,
CS.pricelistdate,
CS.prodclassid,
CS.modelid,
CS.marketid
FROM dbo.SourceTable2 CS
WHERE CS.MarketID IS NOT NULL;
-- Declare variables to hold values used for updates
DECLARE @min int,
@max int,
@ID uniqueidentifier,
@PriceListDate varchar(8),
@ProdClassID varchar(10),
@ModelID varchar(20),
@MarketID varchar(10);
-- Set minimum and maximum values for looping
SET @min = 1;
SET @max = (SELECT MAX(RowID) From #CSpec);
-- Populate other variables in a loop
WHILE @min <= @max
BEGIN
SELECT
@ID = ID,
@PriceListDate = PriceListDate,
@ProdClassID = ProdClassID,
@ModelID = ModelID,
@MarketID = MarketID
FROM #CSpec
WHERE RowID = @min;
-- Use CTE to get relevant values from SourceTable1
;WITH Variant_CTE AS
(
SELECT V.variantid,
V.varianttype,
V.visibility,
MAX(V.PriceListVersion) LatestPriceVersion
FROM SourceTable1 V
WHERE V.ModelID = @ModelID
AND V.ProdClassID = @ProdClassID
AND V.PriceListDate = @PriceListDate
AND V.MarketID = @MarketID
GROUP BY
V.variantid,
V.varianttype,
V.visibility
)
-- Update the TargetTable with the values obtained in the CTE
UPDATE SV
SET SV.VariantType = VC.VariantType,
SV.Visibility = VC.Visibility
FROM spec_variant SV
INNER JOIN TargetTable VC
ON SV.VariantId = VC.VariantId
WHERE SV.ChassisSpecificationId = @ID
AND SV.VariantType IS NULL
AND SV.Visibility IS NULL;
-- Increment the value of loop variable
SET @min = @min+1;
END
-- Clean up
DROP TABLE #CSpec
_
_@max
_変数の値をハードコーディングして、反復の制限を10に設定すると、約30秒かかります。ただし、制限を50回に増やすと、完了するまでに約4分かかります。 400,000回の反復にかかる実行時間が本番環境で数日になることを心配しています。ただし、TargetTable
がロックされず、ユーザーがアクセスできない場合は、それでも問題はありません。
すべての入力を歓迎します。
ありがとう、Raj
スピードアップするために、あなたは試すことができます
ここでのクエリプランは、実行している操作のインデックスが不十分であるため、多くのスキャンを表示するはずです。
ターゲットテーブルのインデックス付けは正常に表示されます
別の観察:uniqueidentifierとvarcharはクラスター化インデックス(ここではPK)には不適切な選択です。少なくとも、コレクション比較のオーバーヘッドが広すぎず、増加していません
編集、別の観察(@Marianに感謝)
クラスタ化インデックスは一般的に広いです。すべての非クラスター化インデックスはクラスター化インデックスを指します。つまり、巨大なNCインデックスも意味します
おそらくクラスター化されたPKを並べ替えることで同じ結果を得ることができます。
コミュニティーの利益のために、このプロセスの最終SQLを投稿する
/********************************************************************************************************************
* Notes: Since this approach executes in a loop inside an explicit transaction, locks will be obtained and *
* released for each iteration, thus minimizing impact on other users accessing the same table at the same time. *
* *
* This process would update 10,000 to 12,000 rows per second, and thus is estimated to run for approximately *
* 23 hours on production with 850 million rows in Spec_Variant table. However, we can harness the power of *
* mutli-threading, by statically defining the @min and @max variable values and then running multiple sessions *
* of this update. This will reduce the time required to 23 hours divided by the number of sessions. In other words,*
* if we run 8 sessions of this update query parallelly, it should complete in 23/8 ~ 3 hours. If multiple sessions *
* are possible, then the temp table needs to be created as a global temp table and populated in its own session. *
* Additionally, each sessions @max and @min values need to be hard coded,for example, 1-50000, 50001-100000, etc. *
*********************************************************************************************************************/
-- However, to make this possible, we will have to use...
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
-- ... this would be the ideal setting to minimize locking. Before using this, we will need to execute
-- ALTER DATABASE MyDatabase
-- SET ALLOW_SNAPSHOT_ISOLATION ON
-- Alternately, if access rights permit, executing
-- DBCC TRACEON(1211,-1) will disable lock escalation. Else, the TRANSACTION ISOLATION LEVEL can be left at
-- default (READ COMMITTED), but will not allow us to run multiple sessions.
SET NOCOUNT ON;
-- Check if Temp table already exists and drop if it does
IF EXISTS(
SELECT NULL
FROM tempdb.sys.tables
WHERE name LIKE '#CSpec%'
)
BEGIN
DROP TABLE #CSpec;
END;
-- Create Temp table to assign sequence numbers
CREATE Table #CSpec
(
RowID int PRIMARY KEY,
ID uniqueidentifier,
PriceListDate VarChar(8),
ProdClassID VarChar(10),
ModelID VarChar(20),
MarketID Varchar(10)
);
-- Populate temp table
INSERT INTO #CSpec
SELECT ROW_NUMBER() OVER (ORDER BY MarketID) RowID,
CS.id,
CS.pricelistdate,
CS.prodclassid,
CS.modelid,
CS.marketid
FROM dbo.SourceTable2 CS
WHERE CS.MarketID IS NOT NULL
-- This AND clause will allow this process to be run multiple times in timed sessions and will prevent
-- an attempt to update rows that were already updated in an earlier session. If the process will be run
-- only once from start to finish, this block can be commented out
AND CS.Id NOT IN
(
SELECT DISTINCT ChassisSpecificationId
FROM TargetTable
WHERE VariantType IS NOT NULL AND Visibility IS NOT NULL
);
-- Declare variables to hold values used for updates
DECLARE @min int,
@max int,
@ID uniqueidentifier,
@PriceListDate varchar(8),
@ProdClassID varchar(10),
@ModelID varchar(20),
@MarketID varchar(10);
-- Set minimum and maximum values for looping. See comments in the notes section on top.
SELECT @min = 1,@max = MAX(RowID) From #CSpec;
-- Populate other variables in a loop
WHILE @min <= @max
BEGIN
BEGIN TRY
BEGIN TRANSACTION;
SELECT
@ID = ID,
@PriceListDate = PriceListDate,
@ProdClassID = ProdClassID,
@ModelID = ModelID,
@MarketID = MarketID
FROM #CSpec
WHERE RowID = @min;
-- Use CTE to get relevant values from SourceTable1
;WITH CTE AS
(
SELECT V.variantid,
V.varianttype,
V.visibility,
MAX(V.PriceListVersion) LatestPriceVersion
FROM SourceTable1 V
WHERE V.ModelID = @ModelID
AND V.ProdClassID = @ProdClassID
AND V.PriceListDate = @PriceListDate
AND V.MarketID = @MarketID
GROUP BY
V.variantid,
V.varianttype,
V.visibility
)
-- Update the TargetTable with the values obtained in the CTE
UPDATE SV
SET SV.VariantType = VC.VariantType,
SV.Visibility = VC.Visibility
FROM spec_variant SV
INNER JOIN CTE VC
ON SV.VariantId = VC.VariantId
WHERE SV.ChassisSpecificationId = @ID
AND SV.VariantType IS NULL
AND SV.Visibility IS NULL;
-- Check for errors and commit transaction
IF @@ERROR = 0
BEGIN
COMMIT TRANSACTION;
-- Increment the value of loop variable
SET @min = @min+1;
END
END TRY
BEGIN CATCH
IF @@ERROR <> 0
BEGIN
ROLLBACK;
END
END CATCH
END
-- Clean up
SET NOCOUNT OFF;
DROP TABLE #CSpec;