dongyan4424 2014-07-20 02:06
浏览 29
已采纳

Mysql - 删除表中的重复条目取决于几个列

I want to delete all duplicate entries in my mysql database.

I found a lot of solutions in the web but in every case the query worked only when you are looking up for 1 column.

In my case I need a query for more than 1 row which will be executed as far as possible - the table is 500MB big with more than 6 Million entries.

My Table (exampe)

id  name status   email
----------------------------
x   Mark  1      Mark@trash-mail.com
x   Anna  1      Anna@trash-mail.com
x   Mark  0      Mark@trash-mail.com
x   Mark  1      Mark@trash-mail.com

What should now happen ? I need a query like this:

DELETE * FROM my_table WHERE (name == name, status == status, email == email)

When executing this query the table has to look like this:

id  name status   email
----------------------------
x   Mark  1      Mark@trash-mail.com
x   Anna  1      Anna@trash-mail.com
x   Mark  0      Mark@trash-mail.com

The last Mark-Entry was deleted because the name, status and email field was equal to another. In some cases there are 20 or more with the same data which has to be deleted so that there is just one left.

Currently I have a bad solution with a PHP-Script .. I am going to each row, looking for the same data and delete it. It works like it should but its too slow ... I think maybe 5.000 Items per Hour ...

You know any solution to solve this in a better way ?

  • 写回答

1条回答 默认 最新

  • duananyu9231 2014-07-20 02:08
    关注

    My suggestion would be to use the method where you truncate the table and reinsert the data. Something like:

    create temporary table temp as
        select min(id) as id, name, status, email
        from mytable
        group by name, status, email;
    
    truncate table mytable;
    
    insert into mytable(id, name, status, email)
        select id, name, status, email
        from temp;
    

    Doing the deleting in place can be quite expensive. If you decide to do so, I would create the same temporary table with an index on id, and then use:

    delete m
        from mytable m left join
             temp
             on m.id = temp.id
        where temp.id is null;
    

    You can use a limit clause (such as limit 10000) to run this in batches.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 arduino控制ps2手柄一直报错
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 求chat4.0解答一道线性规划题,用lingo编程运行,第一问要求写出数学模型和lingo语言编程模型,第二问第三问解答就行,我的ddl要到了谁来求了
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题
  • ¥15 Visual Studio问题
  • ¥20 求一个html代码,有偿
  • ¥100 关于使用MATLAB中copularnd函数的问题
  • ¥20 在虚拟机的pycharm上