SQL Serverの重複レコードを削除しますか？

Question

EmployeeName table Employeeという名前の列を考えます。目標は、EmployeeNameフィールドに基づいて、繰り返し記録を削除することです。

EmployeeName ------------ Anand Anand Anil Dipak Anil Dipak Dipak Anil

1つのクエリを使用して、繰り返されるレコードを削除します。

これをSQL ServerのTSQLでどのように行うことができますか？

John Gibb · Accepted Answer

これは、ウィンドウ関数を使用して実行できます。 empIdで重複を並べ替え、最初のもの以外はすべて削除します。

delete x from ( select *, rn=row_number() over (partition by EmployeeName order by empId) from Employee ) x where rn > 1;

選択として実行し、削除されるものを確認します。

select * from ( select *, rn=row_number() over (partition by EmployeeName order by empId) from Employee ) x where rn > 1;

StuartLC · Answer

Employeeテーブルにも一意の列（以下の例ではID）があると仮定すると、以下が機能します。

delete from Employee where ID not in ( select min(ID) from Employee group by EmployeeName );

これにより、テーブル内で最も低いIDのバージョンが残ります。

編集
マクガイバーのコメント- SQL 2012

MINは、数値、char、varchar、uniqueidentifier、またはdatetime列で使用できますが、ビット列では使用できません

2008 R2 以前の場合、

MINは、数値列、char列、varchar列、または日時列で使用できますが、ビット列では使用できません（GUIDでも機能しません）

2008R2では、GUIDをMINでサポートされている型にキャストする必要があります。

delete from GuidEmployees where CAST(ID AS binary(16)) not in ( select min(CAST(ID AS binary(16))) from GuidEmployees group by EmployeeName );

SQL 2008のさまざまなタイプのSqlFiddle

SQL 2012のさまざまなタイプのSqlFiddle

Ben Cawley · Answer

次のようなものを試すことができます：

delete T1 from MyTable T1, MyTable T2 where T1.dupField = T2.dupField and T1.uniqueField > T2.uniqueField

（これは、整数ベースの一意のフィールドがあることを前提としています）

個人的には、修正後の操作としてではなく、重複するエントリが発生する前にデータベースに追加されているという事実を修正した方が良いと思います。

Kumar Manish-PMP · Answer

DELETE FROM MyTable WHERE ID NOT IN ( SELECT MAX(ID) FROM MyTable GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

WITH TempUsers (FirstName, LastName, duplicateRecordCount) AS ( SELECT FirstName, LastName, ROW_NUMBER() OVER (PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount FROM dbo.Users ) DELETE FROM TempUsers WHERE duplicateRecordCount > 1

Mostafa Elmoghazi · Answer

WITH CTE AS ( SELECT EmployeeName, ROW_NUMBER() OVER(PARTITION BY EmployeeName ORDER BY EmployeeName) AS R FROM employee_table ) DELETE CTE WHERE R > 1;

共通テーブル式の魔法。

Anurag Garg · Answer

試して

DELETE FROM employee WHERE rowid NOT IN (SELECT MAX(rowid) FROM employee GROUP BY EmployeeName);

Peter · Answer

重複を削除する方法を探しているが、重複のあるテーブルを指す外部キーがある場合は、低速で効果的なカーソルを使用して次のアプローチを取ることができます。

外部キーテーブルの重複キーを再配置します。

create table #properOlvChangeCodes( id int not null, name nvarchar(max) not null ) DECLARE @name VARCHAR(MAX); DECLARE @id INT; DECLARE @newid INT; DECLARE @oldid INT; DECLARE OLVTRCCursor CURSOR FOR SELECT id, name FROM Sales_OrderLineVersionChangeReasonCode; OPEN OLVTRCCursor; FETCH NEXT FROM OLVTRCCursor INTO @id, @name; WHILE @@FETCH_STATUS = 0 BEGIN -- determine if it should be replaced (is already in temptable with name) if(exists(select * from #properOlvChangeCodes where Name=@name)) begin -- if it is, finds its id Select top 1 @newid = id from Sales_OrderLineVersionChangeReasonCode where Name = @name -- replace terminationreasoncodeid in olv for the new terminationreasoncodeid update Sales_OrderLineVersion set ChangeReasonCodeId = @newid where ChangeReasonCodeId = @id -- delete the record from the terminationreasoncode delete from Sales_OrderLineVersionChangeReasonCode where Id = @id end else begin -- insert into temp table if new insert into #properOlvChangeCodes(Id, name) values(@id, @name) end FETCH NEXT FROM OLVTRCCursor INTO @id, @name; END; CLOSE OLVTRCCursor; DEALLOCATE OLVTRCCursor; drop table #properOlvChangeCodes

Daniel Marcus · Answer

実行時に定義できる目的の主キーに基づくID列を持つテーブルのレコードを重複排除する素敵な方法を次に示します。始める前に、次のコードを使用して動作するサンプルデータセットを作成します。

if exists (select 1 from sys.all_objects where type='u' and name='_original') drop table _original declare @startyear int = 2017 declare @endyear int = 2018 declare @iterator int = 1 declare @income money = cast((SELECT round(Rand()*(5000-4990)+4990 , 2)) as money) declare @salesrepid int = cast(floor(Rand()*(9100-9000)+9000) as varchar(4)) create table #original (rowid int identity, monthyear varchar(max), salesrepid int, sale money) while @iterator<=50000 begin insert #original select (Select cast(floor(Rand()*(@endyear-@startyear)+@startyear) as varchar(4))+'-'+ cast(floor(Rand()*(13-1)+1) as varchar(2)) ), @salesrepid , @income set @salesrepid = cast(floor(Rand()*(9100-9000)+9000) as varchar(4)) set @income = cast((SELECT round(Rand()*(5000-4990)+4990 , 2)) as money) set @iterator=@iterator+1 end update #original set monthyear=replace(monthyear, '-', '-0') where len(monthyear)=6 select * into _original from #original

次に、ColumnNamesというTypeを作成します。

create type ColumnNames AS table (Columnnames varchar(max))

最後に、次の3つの注意事項を持つストアドプロシージャを作成します。1.プロシージャは、データベースから削除するテーブルの名前を定義する必須パラメーター@tablenameを受け取ります。 2. procには、削除する対象の主キーを構成するフィールドを定義するために使用できるオプションのパラメーター@columnsがあります。このフィールドを空白のままにすると、ID列以外のすべてのフィールドが目的の主キーを構成すると見なされます。 3.重複するレコードが削除されると、ID列の値が最も低いレコードが維持されます。

次に、delete_dupesストアドプロシージャを示します。

 create proc delete_dupes (@tablename varchar(max), @columns columnnames readonly) as begin declare @table table (iterator int, name varchar(max), is_identity int) declare @tablepartition table (idx int identity, type varchar(max), value varchar(max)) declare @partitionby varchar(max) declare @iterator int= 1 if exists (select 1 from @columns) begin declare @columns1 table (iterator int, columnnames varchar(max)) insert @columns1 select 1, columnnames from @columns set @partitionby = (select distinct substring((Select ', '+t1.columnnames From @columns1 t1 Where T1.iterator = T2.iterator ORDER BY T1.iterator For XML PATH ('')),2, 1000) partition From @columns1 T2 ) end insert @table select 1, a.name, is_identity from sys.all_columns a join sys.all_objects b on a.object_id=b.object_id where b.name = @tablename declare @identity varchar(max)= (select name from @table where is_identity=1) while @iterator>=0 begin insert @tablepartition Select distinct case when @iterator=1 then 'order by' else 'over (partition by' end , substring((Select ', '+t1.name From @table t1 Where T1.iterator = T2.iterator and is_identity=@iterator ORDER BY T1.iterator For XML PATH ('')),2, 5000) partition From @table T2 set @iterator=@iterator-1 end declare @originalpartition varchar(max) if @partitionby is null begin select @originalpartition = replace(b.value+','+a.type+a.value ,'over (partition by','') from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1 select @partitionby = a.type+a.value+' '+b.type+a.value+','+b.value+') rownum' from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1 end else begin select @originalpartition=b.value +','+ @partitionby from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1 set @partitionby = (select 'OVER (partition by'+ @partitionby + ' ORDER BY'+ @partitionby + ','+b.value +') rownum' from @tablepartition a cross join @tablepartition b where a.idx=2 and b.idx=1) end exec('select row_number() ' + @partitionby +', '+@originalpartition+' into ##temp from '+ @tablename+'') exec( 'delete a from _original a left join ##temp b on a.'+@identity+'=b.'+@identity+' and rownum=1 where b.rownum is null') drop table ##temp end

これが順守されると、procを実行して重複するレコードをすべて削除できます。目的の主キーを定義せずに重複を削除するには、次の呼び出しを使用します。

exec delete_dupes '_original'

定義された目的の主キーに基づいてデュープを削除するには、次の呼び出しを使用します。

declare @table1 as columnnames insert @table1 values ('salesrepid'),('sale') exec delete_dupes '_original' , @table1