SQL Duplicate Rows
SELECT
SELECT mycol1, COUNT(mycol1),
mycol2, COUNT(mycol2),
mycol3, COUNT(mycol3)
FROM mytable
GROUP BY mycol1,
mycol2,
mycol3
HAVING COUNT(mycol1) > 1
AND COUNT(mycol2) > 1
AND COUNT(mycol3) > 1;
DELETE
DELETE
FROM mytable m1
WHERE m1.ROWID > (SELECT MIN(m2.ROWID)
FROM mytable m2
WHERE m2.mycolumn = m1.mycolumn);
There are two problems with the approaches above...
You do not have opportunity to review the duplicates
The transaction may be huge on very large tables (leading to blocking and/or ORA-01555 errors)
To address these problems use a procedure like the one below...
Create a non-unique index on the columns in question.
This allows the query to scan just the index rather than the data blocks and makes the column set comparison more efficient.CREATE INDEX myindex
ON mytable (mycolumn)
...
Create a mirror table to hold the duplicates, which includes the base table rowid as a column.
CREATE TABLE mymirror
AS (SELECT t.rowid row_id,
t.*
FROM mytable t
WHERE 1=2);
Populate the duplicate mirror table with the duplicates from the base table.
All duplicates except the row with the lowest rowid will be inserted. Do an explain to verify the query will use the index. You could include the lowest rowid with a minor change, but be careful on the delete step.INSERT INTO mymirror
SELECT t1.rowid,
t1.*
FROM mytable t1
WHERE t1.rowid > (SELECT MIN(t2.rowid)
FROM mytable t2
WHERE t1.mycolumn = t2.mycolumn);
Review the duplicates in the mirror table. Then when you are ready to delete the duplicates...
DELETE FROM mytable t
WHERE t.rowid IN (SELECT m.row_id
FROM mymirror m);