CTE階層の最適化
以下で更新
アカウントの階層を表すための典型的なアカウント/親アカウントアーキテクチャを持つアカウントのテーブルがあります(SQL Server 2012)。階層をハッシュ化するためにCTEを使用してVIEWを作成しましたが、全体的に見た目は美しく、意図したとおりに機能します。階層を任意のレベルで照会でき、ブランチを簡単に確認できます。
階層の関数として返す必要があるビジネスロジックフィールドが1つあります。各アカウントレコードのフィールドは、ビジネスのサイズを示します(これをCustomerCountと呼びます)。レポートする必要があるロジックでは、ブランチ全体からCustomerCountをロールアップする必要があります。つまり、アカウントが与えられた場合、そのアカウントのcustomercount値と、階層に沿ったアカウントの下のすべてのブランチのすべての子を合計する必要があります。
CTE内に構築された、acct4.acct3.acct2.acct1のような階層フィールドを使用して、フィールドを正常に計算しました。私が遭遇している問題は、単に高速で実行することです。この1つの計算フィールドがない場合、クエリは約3秒で実行されます。計算フィールドを追加すると、4分のクエリになります。
これが、正しい結果を返す、私が思いついた最高のバージョンです。パフォーマンスをそれほど犠牲にすることなく、このAS A VIEWを再構築する方法についてのアイデアを探しています。
これが遅くなる理由はわかりますが(where句で述語を計算する必要があります)、それを構造化して同じ結果を得る別の方法は考えられません。
これは、テーブルを作成し、CTEを私の環境で機能するのとまったく同じように実行するためのサンプルコードです。
Use Tempdb
go
CREATE TABLE dbo.Account
(
Acctid varchar(1) NOT NULL
, Name varchar(30) NULL
, ParentId varchar(1) NULL
, CustomerCount int NULL
);
INSERT Account
SELECT 'A','Best Bet',NULL,21 UNION ALL
SELECT 'B','eStore','A',30 UNION ALL
SELECT 'C','Big Bens','B',75 UNION ALL
SELECT 'D','Mr. Jimbo','B',50 UNION ALL
SELECT 'E','Dr. John','C',100 UNION ALL
SELECT 'F','Brick','A',222 UNION ALL
SELECT 'G','Mortar','C',153 ;
With AccountHierarchy AS
( --Root values have no parent
SELECT
Root.AcctId AccountId
, Root.Name AccountName
, Root.ParentId ParentId
, 1 HierarchyLevel
, cast(Root.Acctid as varchar(4000)) IdHierarchy --highest parent reads right to left as in id3.Acctid2.Acctid1
, cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy --highest parent reads right to left as in name3.name2.name1 (replace '.' so name parse is easy in last step)
, cast(Root.Acctid as varchar(4000)) HierarchySort --reverse of above, read left to right name1.name2.name3 for sorting on reporting only
, cast(Root.Name as varchar(4000)) HierarchyLabel --use for labels on reporting only, indents names under sorted hierarchy
, Root.CustomerCount CustomerCount
FROM
tempdb.dbo.account Root
WHERE
Root.ParentID is null
UNION ALL
SELECT
Recurse.Acctid AccountId
, Recurse.Name AccountName
, Recurse.ParentId ParentId
, Root.HierarchyLevel + 1 HierarchyLevel --next level in hierarchy
, cast(cast(recurse.Acctid as varchar(40)) + '.' + Root.IdHierarchy as varchar(4000)) IdHierarchy --cast because in real system this is a uniqueidentifier type needs converting
, cast(replace(recurse.Name,'.','') + '.' + Root.NameHierarchy as varchar(4000)) NameHierarchy --replace '.' for parsing in last step, cast to make room for lots of sub levels down the hierarchy
, cast(Root.AccountName + '.' + Recurse.Name as varchar(4000)) HierarchySort
, cast(space(root.HierarchyLevel * 4) + Recurse.Name as varchar(4000)) HierarchyLabel
, Recurse.CustomerCount CustomerCount
FROM
tempdb.dbo.account Recurse INNER JOIN
AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)
SELECT
hier.AccountId
, Hier.AccountName
, hier.ParentId
, hier.HierarchyLevel
, hier.IdHierarchy
, hier.NameHierarchy
, hier.HierarchyLabel
, parsename(hier.IdHierarchy,1) Acct1Id
, parsename(hier.NameHierarchy,1) Acct1Name --This is why we stripped out '.' during recursion
, parsename(hier.IdHierarchy,2) Acct2Id
, parsename(hier.NameHierarchy,2) Acct2Name
, parsename(hier.IdHierarchy,3) Acct3Id
, parsename(hier.NameHierarchy,3) Acct3Name
, parsename(hier.IdHierarchy,4) Acct4Id
, parsename(hier.NameHierarchy,4) Acct4Name
, hier.CustomerCount
/* fantastic up to this point. Next block of code is what causes problem.
Logic of code is "sum of CustomerCount for this location and all branches below in this branch of hierarchy"
In live environment, goes from taking 3 seconds to 4 minutes by adding this one calc */
, (
SELECT
sum(children.CustomerCount)
FROM
AccountHierarchy Children
WHERE
hier.IdHierarchy = right(children.IdHierarchy, (1 /*length of id field*/ * hier.HierarchyLevel) + hier.HierarchyLevel - 1 /*for periods inbetween ids*/)
--"where this location's idhierarchy is within child idhierarchy"
--previously tried a charindex(hier.IdHierarchy,children.IdHierarchy)>0, but that performed even worse
) TotalCustomerCount
FROM
AccountHierarchy hier
ORDER BY
hier.HierarchySort
drop table tempdb.dbo.Account
2013年11月20日更新
提案された解決策のいくつかは私のジュースを流しました、そして私は近づく新しいアプローチを試みましたが、新しい/異なる障害をもたらします。正直なところ、これが別の投稿に値するかどうかはわかりませんが、これはこの問題の解決策に関連しています。
Sum(customercount)を難しくしているのは、最上部から始まり、下がっていく階層のコンテキストで子供を識別することだと私が決定したのは、したがって、「他のアカウントの親ではないアカウント」で定義されたルートを使用して、下から上に構築する階層を作成し、後方への再帰的結合を実行することから始めました(root.parentacctid = recurse.acctid)
このようにして、再帰が発生したときに子の顧客数を親に追加できます。レポートとレベルの必要性から、上から下に加えて下から上にCTEを実行し、アカウントIDを介してそれらを結合しています。このアプローチは、元の外部クエリの顧客数よりもはるかに高速であることが判明しましたが、いくつかの障害に遭遇しました。
1つ目は、複数の子の親であるアカウントの重複顧客数をうっかりキャプチャしていたことです。私はいくつかのアカウントの顧客数を2倍または3倍にしました。私の解決策は、アカウントが持つノードの数をカウントするさらに別のcteを作成し、再帰中にacct.customercountを分割することです。そのため、ブランチ全体を合計しても、アカウントは二重にカウントされません。
したがって、現時点では、この新しいバージョンの結果は正しくありませんが、理由はわかります。ボトムアップcteは重複を作成しています。再帰が成功すると、アカウントテーブルのアカウントの子であるルート(最下位レベルの子)内のすべてが検索されます。 3番目の再帰では、2番目の再帰で行ったのと同じアカウントを取得し、それらを再び配置します。
ボトムアップCTEを実行する方法に関するアイデア、または他のアイデアが流れるのでしょうか?
Use Tempdb
go
CREATE TABLE dbo.Account
(
Acctid varchar(1) NOT NULL
, Name varchar(30) NULL
, ParentId varchar(1) NULL
, CustomerCount int NULL
);
INSERT Account
SELECT 'A','Best Bet',NULL,1 UNION ALL
SELECT 'B','eStore','A',2 UNION ALL
SELECT 'C','Big Bens','B',3 UNION ALL
SELECT 'D','Mr. Jimbo','B',4 UNION ALL
SELECT 'E','Dr. John','C',5 UNION ALL
SELECT 'F','Brick','A',6 UNION ALL
SELECT 'G','Mortar','C',7 ;
With AccountHierarchy AS
( --Root values have no parent
SELECT
Root.AcctId AccountId
, Root.Name AccountName
, Root.ParentId ParentId
, 1 HierarchyLevel
, cast(Root.Acctid as varchar(4000)) IdHierarchy --highest parent reads right to left as in id3.Acctid2.Acctid1
, cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy --highest parent reads right to left as in name3.name2.name1 (replace '.' so name parse is easy in last step)
, cast(Root.Acctid as varchar(4000)) HierarchySort --reverse of above, read left to right name1.name2.name3 for sorting on reporting only
, cast(Root.Acctid as varchar(4000)) HierarchyMatch
, cast(Root.Name as varchar(4000)) HierarchyLabel --use for labels on reporting only, indents names under sorted hierarchy
, Root.CustomerCount CustomerCount
FROM
tempdb.dbo.account Root
WHERE
Root.ParentID is null
UNION ALL
SELECT
Recurse.Acctid AccountId
, Recurse.Name AccountName
, Recurse.ParentId ParentId
, Root.HierarchyLevel + 1 HierarchyLevel --next level in hierarchy
, cast(cast(recurse.Acctid as varchar(40)) + '.' + Root.IdHierarchy as varchar(4000)) IdHierarchy --cast because in real system this is a uniqueidentifier type needs converting
, cast(replace(recurse.Name,'.','') + '.' + Root.NameHierarchy as varchar(4000)) NameHierarchy --replace '.' for parsing in last step, cast to make room for lots of sub levels down the hierarchy
, cast(Root.AccountName + '.' + Recurse.Name as varchar(4000)) HierarchySort
, CAST(CAST(Root.HierarchyMatch as varchar(40)) + '.'
+ cast(recurse.Acctid as varchar(40)) as varchar(4000)) HierarchyMatch
, cast(space(root.HierarchyLevel * 4) + Recurse.Name as varchar(4000)) HierarchyLabel
, Recurse.CustomerCount CustomerCount
FROM
tempdb.dbo.account Recurse INNER JOIN
AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)
, Nodes as
( --counts how many branches are below for any account that is parent to another
select
node.ParentId Acctid
, cast(count(1) as float) Nodes
from AccountHierarchy node
group by ParentId
)
, BottomUp as
( --creates the hierarchy starting at accounts that are not parent to any other
select
Root.Acctid
, root.ParentId
, cast(isnull(root.customercount,0) as float) CustomerCount
from
tempdb.dbo.Account Root
where
not exists ( select 1 from tempdb.dbo.Account OtherAccts where root.Acctid = OtherAccts.ParentId)
union all
select
Recurse.Acctid
, Recurse.ParentId
, root.CustomerCount + cast ((isnull(recurse.customercount,0) / nodes.nodes) as float) CustomerCount
-- divide the recurse customercount by number of nodes to prevent duplicate customer count on accts that are parent to multiple children, see customercount cte next
from
tempdb.dbo.Account Recurse inner join
BottomUp Root on root.ParentId = recurse.acctid inner join
Nodes on nodes.Acctid = recurse.Acctid
)
, CustomerCount as
(
select
sum(CustomerCount) TotalCustomerCount
, hier.acctid
from
BottomUp hier
group by
hier.Acctid
)
SELECT
hier.AccountId
, Hier.AccountName
, hier.ParentId
, hier.HierarchyLevel
, hier.IdHierarchy
, hier.NameHierarchy
, hier.HierarchyLabel
, hier.hierarchymatch
, parsename(hier.IdHierarchy,1) Acct1Id
, parsename(hier.NameHierarchy,1) Acct1Name --This is why we stripped out '.' during recursion
, parsename(hier.IdHierarchy,2) Acct2Id
, parsename(hier.NameHierarchy,2) Acct2Name
, parsename(hier.IdHierarchy,3) Acct3Id
, parsename(hier.NameHierarchy,3) Acct3Name
, parsename(hier.IdHierarchy,4) Acct4Id
, parsename(hier.NameHierarchy,4) Acct4Name
, hier.CustomerCount
, customercount.TotalCustomerCount
FROM
AccountHierarchy hier inner join
CustomerCount on customercount.acctid = hier.accountid
ORDER BY
hier.HierarchySort
drop table tempdb.dbo.Account
編集:これは2回目の試行です
@Max Vernonの回答に基づいて、インラインサブクエリ内でのCTEの使用を回避する方法を次に示します。これは、CTEに自己結合するようなものであり、効率が悪い理由だと思います。 SQL-Serverの2012バージョンでのみ利用可能な分析関数を使用します。 SQL-Fiddleでテスト済み
この部分は読むことをスキップできます、それはマックスの答えからのコピーペーストです:
_;With AccountHierarchy AS
(
SELECT
Root.AcctId AccountId
, Root.Name AccountName
, Root.ParentId ParentId
, 1 HierarchyLevel
, cast(Root.Acctid as varchar(4000)) IdHierarchyMatch
, cast(Root.Acctid as varchar(4000)) IdHierarchy
, cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy
, cast(Root.Acctid as varchar(4000)) HierarchySort
, cast(Root.Name as varchar(4000)) HierarchyLabel ,
Root.CustomerCount CustomerCount
FROM
account Root
WHERE
Root.ParentID is null
UNION ALL
SELECT
Recurse.Acctid AccountId
, Recurse.Name AccountName
, Recurse.ParentId ParentId
, Root.HierarchyLevel + 1 HierarchyLevel
, CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.'
+ cast(recurse.Acctid as varchar(40)) as varchar(4000)) IdHierarchyMatch
, cast(cast(recurse.Acctid as varchar(40)) + '.'
+ Root.IdHierarchy as varchar(4000)) IdHierarchy
, cast(replace(recurse.Name,'.','') + '.'
+ Root.NameHierarchy as varchar(4000)) NameHierarchy
, cast(Root.AccountName + '.'
+ Recurse.Name as varchar(4000)) HierarchySort
, cast(space(root.HierarchyLevel * 4)
+ Recurse.Name as varchar(4000)) HierarchyLabel
, Recurse.CustomerCount CustomerCount
FROM
account Recurse INNER JOIN
AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)
_
ここでは、IdHierarchyMatch
を使用してCTEの行を並べ替え、行番号と現在の合計(次の行から最後まで)を計算します。
_, cte1 AS
(
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, h.IdHierarchy
, h.NameHierarchy
, h.HierarchyLabel
, parsename(h.IdHierarchy,1) Acct1Id
, parsename(h.NameHierarchy,1) Acct1Name
, parsename(h.IdHierarchy,2) Acct2Id
, parsename(h.NameHierarchy,2) Acct2Name
, parsename(h.IdHierarchy,3) Acct3Id
, parsename(h.NameHierarchy,3) Acct3Name
, parsename(h.IdHierarchy,4) Acct4Id
, parsename(h.NameHierarchy,4) Acct4Name
, h.CustomerCount
, h.HierarchySort
, h.IdHierarchyMatch
, Rn = ROW_NUMBER() OVER
(ORDER BY h.IdHierarchyMatch)
, RunningCustomerCount = COALESCE(
SUM(h.CustomerCount)
OVER
(ORDER BY h.IdHierarchyMatch
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING)
, 0)
FROM
AccountHierarchy AS h
)
_
次に、以前の累計と行番号を使用する中間CTEがもう1つあります。基本的には、ツリー構造のブランチのエンドポイントの場所を見つけるためです。
_, cte2 AS
(
SELECT
cte1.*
, rn3 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn2 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rn1 = LAST_VALUE(Rn) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc3 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc2 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
, rcc1 = LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM
cte1
)
_
最後に、最後の部分を作成します。
_SELECT
hier.AccountId
, hier.AccountName
--- -- columns skipped
, hier.CustomerCount
, TotalCustomerCount = hier.CustomerCount
+ hier.RunningCustomerCount
- ca.LastRunningCustomerCount
, hier.HierarchySort
, hier.IdHierarchyMatch
FROM
cte2 hier
OUTER APPLY
( SELECT LastRunningCustomerCount, Rn
FROM
( SELECT LastRunningCustomerCount
= RunningCustomerCount, Rn
FROM (SELECT NULL a) x WHERE 4 <= HierarchyLevel
UNION ALL
SELECT rcc3, Rn3
FROM (SELECT NULL a) x WHERE 3 <= HierarchyLevel
UNION ALL
SELECT rcc2, Rn2
FROM (SELECT NULL a) x WHERE 2 <= HierarchyLevel
UNION ALL
SELECT rcc1, Rn1
FROM (SELECT NULL a) x WHERE 1 <= HierarchyLevel
) x
ORDER BY Rn
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY
) ca
ORDER BY
hier.HierarchySort ;
_
そして、上記のコードと同じ_cte1
_を使用して単純化します。 SQL-Fiddle-2でテストします。どちらのソリューションも、ツリーに最大4つのレベルがあることを前提として機能することに注意してください。
_SELECT
hier.AccountId
--- -- skipping rows
, hier.CustomerCount
, TotalCustomerCount = CustomerCount
+ RunningCustomerCount
- CASE HierarchyLevel
WHEN 4 THEN RunningCustomerCount
WHEN 3 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id, Acct3Id
ORDER BY Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
WHEN 2 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id, Acct2Id
ORDER BY Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
WHEN 1 THEN LAST_VALUE(RunningCustomerCount) OVER
(PARTITION BY Acct1Id
ORDER BY Acct2Id, Acct3Id, Acct4Id
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
END
, hier.HierarchySort
, hier.IdHierarchyMatch
FROM cte1 AS hier
ORDER BY
hier.HierarchySort ;
_
3番目のアプローチは、CTEが1つだけで、再帰部分がウィンドウ集約関数(SUM() OVER (...)
)のみであるため、2005以降のすべてのバージョンで機能します。 SQL-Fiddle-でテストこのソリューションは、前のソリューションと同様に、階層に最大4つのレベルがあることを前提としています木:
_;WITH AccountHierarchy AS
(
SELECT
AccountId = Root.AcctId
, AccountName = Root.Name
, ParentId = Root.ParentId
, HierarchyLevel = 1
, HierarchySort = CAST(Root.Acctid AS VARCHAR(4000))
, HierarchyLabel = CAST(Root.Name AS VARCHAR(4000))
, Acct1Id = CAST(Root.Acctid AS VARCHAR(4000))
, Acct2Id = CAST(NULL AS VARCHAR(4000))
, Acct3Id = CAST(NULL AS VARCHAR(4000))
, Acct4Id = CAST(NULL AS VARCHAR(4000))
, Acct1Name = CAST(Root.Name AS VARCHAR(4000))
, Acct2Name = CAST(NULL AS VARCHAR(4000))
, Acct3Name = CAST(NULL AS VARCHAR(4000))
, Acct4Name = CAST(NULL AS VARCHAR(4000))
, CustomerCount = Root.CustomerCount
FROM
account AS Root
WHERE
Root.ParentID IS NULL
UNION ALL
SELECT
Recurse.Acctid
, Recurse.Name
, Recurse.ParentId
, Root.HierarchyLevel + 1
, CAST(Root.AccountName + '.'
+ Recurse.Name AS VARCHAR(4000))
, CAST(SPACE(Root.HierarchyLevel * 4)
+ Recurse.Name AS VARCHAR(4000))
, Root.Acct1Id
, CASE WHEN Root.HierarchyLevel = 1
THEN cast(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct2Id
END
, CASE WHEN Root.HierarchyLevel = 2
THEN CAST(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct3Id
END
, CASE WHEN Root.HierarchyLevel = 3
THEN CAST(Recurse.Acctid AS VARCHAR(4000))
ELSE Root.Acct4Id
END
, cast(Root.AccountName as varchar(4000))
, CASE WHEN Root.HierarchyLevel = 1
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct2Name
END
, CASE WHEN Root.HierarchyLevel = 2
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct3Name
END
, CASE WHEN Root.HierarchyLevel = 3
THEN CAST(Recurse.Name AS VARCHAR(4000))
ELSE Root.Acct4Name
END
, Recurse.CustomerCount
FROM
account AS Recurse INNER JOIN
AccountHierarchy AS Root ON Root.AccountId = Recurse.ParentId
)
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, IdHierarchy =
CAST(COALESCE(h.Acct4Id+'.','')
+ COALESCE(h.Acct3Id+'.','')
+ COALESCE(h.Acct2Id+'.','')
+ h.Acct1Id AS VARCHAR(4000))
, NameHierarchy =
CAST(COALESCE(h.Acct4Name+'.','')
+ COALESCE(h.Acct3Name+'.','')
+ COALESCE(h.Acct2Name+'.','')
+ h.Acct1Name AS VARCHAR(4000))
, h.HierarchyLabel
, h.Acct1Id
, h.Acct1Name
, h.Acct2Id
, h.Acct2Name
, h.Acct3Id
, h.Acct3Name
, h.Acct4Id
, h.Acct4Name
, h.CustomerCount
, TotalCustomerCount =
CASE h.HierarchyLevel
WHEN 4 THEN h.CustomerCount
WHEN 3 THEN SUM(h.CustomerCount) OVER
(PARTITION BY h.Acct1Id, h.Acct2Id, h.Acct3Id)
WHEN 2 THEN SUM(h.CustomerCount) OVER
(PARTITION BY Acct1Id, h.Acct2Id)
WHEN 1 THEN SUM(h.CustomerCount) OVER
(PARTITION BY h.Acct1Id)
END
, h.HierarchySort
, IdHierarchyMatch =
CAST(h.Acct1Id
+ COALESCE('.'+h.Acct2Id,'')
+ COALESCE('.'+h.Acct3Id,'')
+ COALESCE('.'+h.Acct4Id,'') AS VARCHAR(4000))
FROM
AccountHierarchy AS h
ORDER BY
h.HierarchySort ;
_
中間CTEとして、階層のクロージャテーブルを計算する4番目のアプローチ。 SQL-Fiddle-4でテストします。利点は、合計の計算の場合、レベル数に制限がないことです。
_;WITH AccountHierarchy AS
(
-- skipping several line, identical to the 3rd approach above
)
, ClosureTable AS
(
SELECT
AccountId = Root.AcctId
, AncestorId = Root.AcctId
, CustomerCount = Root.CustomerCount
FROM
account AS Root
UNION ALL
SELECT
Recurse.Acctid
, Root.AncestorId
, Recurse.CustomerCount
FROM
account AS Recurse INNER JOIN
ClosureTable AS Root ON Root.AccountId = Recurse.ParentId
)
, ClosureGroup AS
(
SELECT
AccountId = AncestorId
, TotalCustomerCount = SUM(CustomerCount)
FROM
ClosureTable AS a
GROUP BY
AncestorId
)
SELECT
h.AccountId
, h.AccountName
, h.ParentId
, h.HierarchyLevel
, h.HierarchyLabel
, h.CustomerCount
, cg.TotalCustomerCount
, h.HierarchySort
FROM
AccountHierarchy AS h
JOIN
ClosureGroup AS cg
ON cg.AccountId = h.AccountId
ORDER BY
h.HierarchySort ;
_
私はこれがより速くなると信じています:
;With AccountHierarchy AS
(
SELECT
Root.AcctId AccountId
, Root.Name AccountName
, Root.ParentId ParentId
, 1 HierarchyLevel
, cast(Root.Acctid as varchar(4000)) IdHierarchyMatch
, cast(Root.Acctid as varchar(4000)) IdHierarchy
, cast(replace(Root.Name,'.','') as varchar(4000)) NameHierarchy
, cast(Root.Acctid as varchar(4000)) HierarchySort
, cast(Root.Name as varchar(4000)) HierarchyLabel ,
Root.CustomerCount CustomerCount
FROM
tempdb.dbo.account Root
WHERE
Root.ParentID is null
UNION ALL
SELECT
Recurse.Acctid AccountId
, Recurse.Name AccountName
, Recurse.ParentId ParentId
, Root.HierarchyLevel + 1 HierarchyLevel
, CAST(CAST(Root.IdHierarchyMatch as varchar(40)) + '.'
+ cast(recurse.Acctid as varchar(40)) as varchar(4000)) IdHierarchyMatch
, cast(cast(recurse.Acctid as varchar(40)) + '.'
+ Root.IdHierarchy as varchar(4000)) IdHierarchy
, cast(replace(recurse.Name,'.','') + '.'
+ Root.NameHierarchy as varchar(4000)) NameHierarchy
, cast(Root.AccountName + '.'
+ Recurse.Name as varchar(4000)) HierarchySort
, cast(space(root.HierarchyLevel * 4)
+ Recurse.Name as varchar(4000)) HierarchyLabel
, Recurse.CustomerCount CustomerCount
FROM
tempdb.dbo.account Recurse INNER JOIN
AccountHierarchy Root on Root.AccountId = Recurse.ParentId
)
SELECT
hier.AccountId
, Hier.AccountName
, hier.ParentId
, hier.HierarchyLevel
, hier.IdHierarchy
, hier.NameHierarchy
, hier.HierarchyLabel
, parsename(hier.IdHierarchy,1) Acct1Id
, parsename(hier.NameHierarchy,1) Acct1Name
, parsename(hier.IdHierarchy,2) Acct2Id
, parsename(hier.NameHierarchy,2) Acct2Name
, parsename(hier.IdHierarchy,3) Acct3Id
, parsename(hier.NameHierarchy,3) Acct3Name
, parsename(hier.IdHierarchy,4) Acct4Id
, parsename(hier.NameHierarchy,4) Acct4Name
, hier.CustomerCount
, (
SELECT
sum(children.CustomerCount)
FROM
AccountHierarchy Children
WHERE
Children.IdHierarchyMatch LIKE hier.IdHierarchyMatch + '%'
) TotalCustomerCount
, HierarchySort
, IdHierarchyMatch
FROM
AccountHierarchy hier
ORDER BY
hier.HierarchySort
IdHierarchyMatch
のフォワードバージョンであるIdHierarchy
という名前のCTEに列を追加して、TotalCustomerCount
サブクエリWHERE
句を検索可能にします。
実行プランの推定サブツリーコストを比較すると、この方法は約5倍高速になります。
私も試してみました。あまりきれいではありませんが、パフォーマンスは向上しているようです。
USE Tempdb
go
SET STATISTICS IO ON;
SET STATISTICS TIME OFF;
SET NOCOUNT ON;
--------
-- assuming the original table looks something like this
-- and you cannot control it's indexes
-- (only widened the data types a bit for the extra sample rows)
--------
CREATE TABLE dbo.Account
(
Acctid VARCHAR(10) NOT NULL ,
Name VARCHAR(100) NULL ,
ParentId VARCHAR(10) NULL ,
CustomerCount INT NULL
);
--------
-- inserting the same records as in your sample
--------
INSERT Account
SELECT 'A' ,
'Best Bet' ,
NULL ,
21
UNION ALL
SELECT 'B' ,
'eStore' ,
'A' ,
30
UNION ALL
SELECT 'C' ,
'Big Bens' ,
'B' ,
75
UNION ALL
SELECT 'D' ,
'Mr. Jimbo' ,
'B' ,
50
UNION ALL
SELECT 'E' ,
'Dr. John' ,
'C' ,
100
UNION ALL
SELECT 'F' ,
'Brick' ,
'A' ,
222
UNION ALL
SELECT 'G' ,
'Mortar' ,
'C' ,
153;
--------
-- now lets up the ante a bit and add some extra rows with random parents
-- to these 7 items, it is hard to measure differences with so few rows
--------
DECLARE @numberOfRows INT = 25000
DECLARE @from INT = 1
DECLARE @to INT = 7
DECLARE @T1 TABLE ( n INT );
WITH cte ( n )
AS ( SELECT ROW_NUMBER() OVER ( ORDER BY CURRENT_TIMESTAMP )
FROM sys.messages
)
INSERT INTO @T1
SELECT n
FROM cte
WHERE n <= @numberOfRows;
INSERT INTO dbo.Account
( acctId ,
name ,
parentId ,
Customercount
)
SELECT CHAR(64 + RandomNumber) + CAST(n AS VARCHAR(10)) AS Id ,
CAST('item ' + CHAR(64 + RandomNumber) + CAST(n AS VARCHAR(10)) AS VARCHAR(100)) ,
CHAR(64 + RandomNumber) AS parentId ,
ABS(CHECKSUM(NEWID()) % 100) + 1 AS RandomCustCount
FROM ( SELECT n ,
ABS(CHECKSUM(NEWID()) % @to) + @from AS RandomNumber
FROM @T1
) A;
--------
-- Assuming you cannot control it's indexes, in my tests we're better off taking the IO hit of copying the data
-- to some structure that is better optimized for this query. Not quite what I initially expected, but we seem
-- to be better off that way.
--------
CREATE TABLE tempdb.dbo.T1
(
AccountId VARCHAR(10) NOT NULL
PRIMARY KEY NONCLUSTERED ,
AccountName VARCHAR(100) NOT NULL ,
ParentId VARCHAR(10) NULL ,
HierarchyLevel INT NULL ,
HPath VARCHAR(1000) NULL ,
IdHierarchy VARCHAR(1000) NULL ,
NameHierarchy VARCHAR(1000) NULL ,
HierarchyLabel VARCHAR(1000) NULL ,
HierarchySort VARCHAR(1000) NULL ,
CustomerCount INT NOT NULL
);
CREATE CLUSTERED INDEX IX_Q1
ON tempdb.dbo.T1 ([ParentId]);
-- for summing customer counts over parents
CREATE NONCLUSTERED INDEX IX_Q2
ON tempdb.dbo.T1 (HPath) INCLUDE(CustomerCount);
INSERT INTO tempdb.dbo.T1
( AccountId ,
AccountName ,
ParentId ,
HierarchyLevel ,
HPath ,
IdHierarchy ,
NameHierarchy ,
HierarchyLabel ,
HierarchySort ,
CustomerCount
)
SELECT Acctid AS AccountId ,
Name AS AccountName ,
ParentId AS ParentId ,
NULL AS HierarchyLevel ,
NULL AS HPath ,
NULL AS IdHierarchy ,
NULL AS NameHierarchy ,
NULL AS HierarchyLabel ,
NULL AS HierarchySort ,
CustomerCount AS CustomerCount
FROM tempdb.dbo.account;
--------
-- I cannot seem to force an efficient way to do the sum while selecting over the recursive cte,
-- so I took it aside. I am sure there is a more elegant way but I can't seem to make it happen.
-- At least it performs better this way. But it remains a very expensive query.
--------
;
WITH AccountHierarchy
AS ( SELECT Root.AccountId AS AcId ,
Root.ParentId ,
1 AS HLvl ,
CAST(Root.AccountId AS VARCHAR(1000)) AS [HPa] ,
CAST(Root.accountId AS VARCHAR(1000)) AS hid ,
CAST(REPLACE(Root.AccountName, '.', '') AS VARCHAR(1000)) AS hn ,
CAST(Root.accountid AS VARCHAR(1000)) AS hs ,
CAST(Root.accountname AS VARCHAR(1000)) AS hl
FROM tempdb.dbo.T1 Root
WHERE Root.ParentID IS NULL
UNION ALL
SELECT Recurse.AccountId AS acid ,
Recurse.ParentId ParentId ,
Root.Hlvl + 1 AS hlvl ,
CAST(Root.HPa + '.' + Recurse.AccountId AS VARCHAR(1000)) AS hpa ,
CAST(recurse.AccountId + '.' + Root.hid AS VARCHAR(1000)) AS hid ,
CAST(REPLACE(recurse.AccountName, '.', '') + '.' + Root.hn AS VARCHAR(1000)) AS hn ,
CAST(Root.hs + '.' + Recurse.AccountName AS VARCHAR(1000)) AS hs ,
CAST(SPACE(root.hlvl * 4) + Recurse.AccountName AS VARCHAR(1000)) AS hl
FROM tempdb.dbo.T1 Recurse
INNER JOIN AccountHierarchy Root ON Root.AcId = Recurse.ParentId
)
UPDATE tempdb.dbo.T1
SET HierarchyLevel = HLvl ,
HPath = Hpa ,
IdHierarchy = hid ,
NameHierarchy = hn ,
HierarchyLabel = hl ,
HierarchySort = hs
FROM AccountHierarchy
WHERE AccountId = AcId;
SELECT --HPath ,
AccountId ,
AccountName ,
ParentId ,
HierarchyLevel ,
IdHierarchy ,
NameHierarchy ,
HierarchyLabel ,
PARSENAME(IdHierarchy, 1) Acct1Id ,
PARSENAME(NameHierarchy, 1) Acct1Name ,
PARSENAME(IdHierarchy, 2) Acct2Id ,
PARSENAME(NameHierarchy, 2) Acct2Name ,
PARSENAME(IdHierarchy, 3) Acct3Id ,
PARSENAME(NameHierarchy, 3) Acct3Name ,
PARSENAME(IdHierarchy, 4) Acct4Id ,
PARSENAME(NameHierarchy, 4) Acct4Name ,
CustomerCount ,
Cnt.TotalCustomerCount
FROM tempdb.dbo.t1 Hier
CROSS APPLY ( SELECT SUM(CustomerCount) AS TotalCustomerCount
FROM tempdb.dbo.t1
WHERE HPath LIKE hier.HPath + '%'
) Cnt
ORDER BY HierarchySort;
DROP TABLE tempdb.dbo.t1;
DROP TABLE tempdb.dbo.Account;