We don’t live in a perfect world, which means duplicate rows occasionally show up. Sometimes you just need to delete them and make comments about the poor design of the table later.
I wanted to include the script I often use to get the job done. I honestly don’t know where I first saw this, likely on a blog post or stackoverflow.com. Sometimes I wonder if I ever create anything original. I have three kids, so maybe?
Build Your Sandbox
Here’s some code you can use to follow along. It should go without saying but please don’t run this on a production SQL server.
DROP TABLE IF EXISTS #Employee; GO CREATE TABLE #Employee ( EmployeeNumber int, EmployeeFirstName nvarchar(250), EmployeeLastName nvarchar(250), StartDate date ); GO INSERT INTO #Employee VALUES (1,'Bruce','Wayne', '2022-01-01'), (1,'Bruce','Wayne', '2022-01-01'), (2,'Tony','Stark', '2021-05-02'); GO
Can you spot the duplicate?
Use a CTE to Delete Duplicates
Below’s the code I normally use to remove duplicate rows from a table. I’m a huge fan of using CTEs for almost anything. If you don’t like them that’s ok, I won’t hold it against you. The primary driver is the ROW_NUMBER() function. Since every column’s part of the duplicate, you can use anything for the ORDER BY.
WITH MainData AS( SELECT e.EmployeeNumber, e.EmployeeFirstName, e.EmployeeLastName, e.StartDate, ROW_NUMBER() OVER(PARTITION BY e.EmployeeNumber, e.EmployeeFirstName, e.EmployeeLastName, e.StartDate ORDER BY e.EmployeeNumber) AS RowNumber FROM #Employee e ) DELETE FROM MainData WHERE RowNumber > 1; GO
Now let’s see if our statement above removed the duplicate row.
SELECT * FROM #Employee;
Like magic, your duplicate row has vanished.
Summary
If you perform a Bing or Google search, you’re likely to turn up other methods to delete duplicate rows. Please let me know in the comments below if you use a different technique? Or, if you like using CTEs, let me know so that I’m not the only person.