dongmou1964 2011-03-17 05:51
浏览 180
已采纳

从MySQL数据库中删除重复的条目

I have a table with 8 columns in, but over time I have picked up numerous duplicates. I have looked at the other question with a similar topic, but it does not solve the issue I am currently having.

+---------------------------------------------------------------------------------------+
| id | market | agent | report_name | producer_code | report_date | entered_date | sync |
+---------------------------------------------------------------------------------------+

What defines a unique entry is based on the market, agent, report_name, producer_code, and report_date fields. What I am looking for is a way to list all the duplicate entries and delete them. Or to just delete the duplicate entries.

I have thought about doing it with a script, but the table contains 2.5mil entries, and the time it would take would be unfeasible.

Could anybody suggest any alternatives? I have seen people get a list of duplicates using the following query, but not sure on how to adapt it to my situation:

SELECT id, count(*) AS n
 FROM table_name
GROUP BY id
HAVING n > 1
  • 写回答

4条回答 默认 最新

  • dphj737575 2011-03-17 07:01
    关注

    Here are two strategies you might think about. You will have to adjust the columns used to select duplicates based upon what you actually consider a duplicate. I just included all of your listed columns other than the id column.

    The first simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table. Of course, if you have any foreign key constraints you'll have to deal with those as well.

    create table table_copy like table_name;
    
    insert into table_copy
    (id, market, agent, report_name, producer_code, report_date, entered_date, sync)
    select min(id), market, agent, report_name, producer_code, report_date, 
           entered_date, sync
    from table_name
    group by market, agent, report_name, producer_code, report_date, 
             entered_date, sync;
    
    RENAME TABLE table_name TO table_old, table_copy TO table_name;
    
    drop table table_old;
    

    The second strategy, which just deletes the duplicates, uses a temporary table to hold the information about what rows have duplicates since MySQL won't allow you to select from the same table you are deleting from in a subquery. Simply create a temporary table with the columns that identify the duplicates plus an id column that will actually hold the id to keep and then you can do a multi-table delete where you join the two tables to select just the duplicates.

    create temporary table dups
    select min(id), market, agent, report_name, producer_code, report_date, 
           entered_date, sync
    from table_name
    group by market, agent, report_name, producer_code, report_date, 
             entered_date, sync
    having count(*) > 1;
    
    delete t 
    from table_name t, dups d
    where t.id != d.id
    and t.market = d.market
    and t.agent = d.agent
    and t.report_name = d.report_name
    and t.producer_code = d.producer_code
    and t.report_date = d.report_date
    and t.entered_date = d.entered_date
    and t.sync = d.sync;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 统计大规模图中的完全子图问题
  • ¥15 使用LM2596制作降压电路,一个能运行,一个不能
  • ¥60 要数控稳压电源测试数据
  • ¥15 能帮我写下这个编程吗
  • ¥15 ikuai客户端l2tp协议链接报终止15信号和无法将p.p.p6转换为我的l2tp线路
  • ¥15 经gamit解算的cors站数据再经globk网平差得到的坐标做形变分析
  • ¥15 phython读取excel表格报错 ^7个 SyntaxError: invalid syntax 语句报错
  • ¥20 @microsoft/fetch-event-source 流式响应问题
  • ¥15 ogg dd trandata 报错
  • ¥15 高缺失率数据如何选择填充方式