从MySQL数据库中删除重复的条目

I have a table with 8 columns in, but over time I have picked up numerous duplicates. I have looked at the other question with a similar topic, but it does not solve the issue I am currently having.

+---------------------------------------------------------------------------------------+
| id | market | agent | report_name | producer_code | report_date | entered_date | sync |
+---------------------------------------------------------------------------------------+

What defines a unique entry is based on the market, agent, report_name, producer_code, and report_date fields. What I am looking for is a way to list all the duplicate entries and delete them. Or to just delete the duplicate entries.

I have thought about doing it with a script, but the table contains 2.5mil entries, and the time it would take would be unfeasible.

Could anybody suggest any alternatives? I have seen people get a list of duplicates using the following query, but not sure on how to adapt it to my situation:

SELECT id, count(*) AS n
 FROM table_name
GROUP BY id
HAVING n > 1

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dphj737575 2011-03-17 07:01
关注
Here are two strategies you might think about. You will have to adjust the columns used to select duplicates based upon what you actually consider a duplicate. I just included all of your listed columns other than the id column.

The first simply creates a new table without duplicates. Sometimes this is actually faster and easier than trying to delete all the offending rows. Just create a new table, insert the unique rows (I used min(id) for the id of the resulting row), rename the two tables, and (once you are satisfied that everything worked correctly) drop the original table. Of course, if you have any foreign key constraints you'll have to deal with those as well.

create table table_copy like table_name; insert into table_copy (id, market, agent, report_name, producer_code, report_date, entered_date, sync) select min(id), market, agent, report_name, producer_code, report_date, entered_date, sync from table_name group by market, agent, report_name, producer_code, report_date, entered_date, sync; RENAME TABLE table_name TO table_old, table_copy TO table_name; drop table table_old;

The second strategy, which just deletes the duplicates, uses a temporary table to hold the information about what rows have duplicates since MySQL won't allow you to select from the same table you are deleting from in a subquery. Simply create a temporary table with the columns that identify the duplicates plus an id column that will actually hold the id to keep and then you can do a multi-table delete where you join the two tables to select just the duplicates.

create temporary table dups select min(id), market, agent, report_name, producer_code, report_date, entered_date, sync from table_name group by market, agent, report_name, producer_code, report_date, entered_date, sync having count(*) > 1; delete t from table_name t, dups d where t.id != d.id and t.market = d.market and t.agent = d.agent and t.report_name = d.report_name and t.producer_code = d.producer_code and t.report_date = d.report_date and t.entered_date = d.entered_date and t.sync = d.sync;
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(3条)

报告相同问题？

关注问题

从MySQL数据库中删除重复的条目 mysql php
2011-03-17 05:51

回答 4 已采纳 Here are two strategies you might think about. You will have to adjust the columns used to select
如何从MySql数据库表中删除重复的条目 database mysql php sql
2011-10-28 19:07

回答 2 已采纳 SQL is not my strong point but I think you can export the result of this query: SELECT DISTINCT *
在AJAX请求中从MySQL数据库中获取大数据 ajax mysql php
2014-05-25 17:56

回答 1 已采纳 First var_dump($row); and check it is as you expect. Then instead of echo $key.' <input
MySQL数据库基础和基本的增删改查操作
2023-12-22 21:44

GnaW1nT的博客 SQL语句用于维护管理数据库，包括数据查询、数据更新、访问控制、对象管理等功能。
如何从mysql数据库中随机获取任意条数据？ mysql 数据库
2018-10-08 02:36

回答 3 已采纳 order by random 会把整个表数据顺序打乱，这样就可以直接取需要的条数了，数据量不是特别大的时候可以这样用
如何将csv文件批量存入mysql数据库中 mysql 数据库有问必答自动化
2022-10-25 19:05

回答 4 已采纳用数据库客户端导入，可以将sql、csv等类型数据导入。
PHP代码 - 如何检查Mysql数据库中的重复条目 mysql php
2010-12-26 14:42

回答 4 已采纳 Just use INSERT ON DUPLICATE KEY UPDATE. This way mySql will do the checking for you, if is a new
【MySQL 数据库的命令操作】
2023-06-14 21:15

桦皪的博客 Mysql 数据库的命令操作和基础知识介绍
如何选择mySQL数据库表的最新条目？ [重复] mysql php
2017-02-17 10:07

回答 2 已采纳 Why not just SELECT name, id FROM data ORDER BY id DESC LIMIT 1
Mysql数据库中IFNULL函数的疑问？ mysql 数据库
2023-03-22 07:14

回答 4 已采纳你先写个select not null执行一下不就知道了not null作为逻辑表达式，你要去判断a is not null才有意义，你直接查询它，结果就是null
mysql数据库插入的数据中文部分出现问号，怎么办 mysql 后端数据库
2022-01-05 21:55

回答 1 已采纳为什么非得用GBK呢？你有没有考虑过,现在大部分数据传输都用utf-8了,gbk里少了很多字符,未来出现一串文字里有个别几个字无法保存怎么办？全部都统一用UTF-8才是正解.如果你非得用GBK,也不是
MySQL数据库基本操作+用户管理+用户授权
2023-09-21 17:34

KK小草莓的博客 MySQL数据库基本操作+用户管理+用户授权
使用新条目覆盖MYSQL数据库中的条目 database mysql php
2014-12-10 13:06

回答 3 已采纳 use update query something like this UPDATE MyTable SET User_id = 'USER_ID_VALUE', Longitude='LON
Elasticsearch：将关系数据库中的数据提取到 Elasticsearch 集群中
2022-11-15 17:32

Elastic 中国社区官方博客的博客本指南介绍了如何使用 Logstash JDBC 输入插件通过 Logstash 将关系数据库中的数据提取到 Elastic...Logstash Java 数据库连接 (JDBC) 输入插件使你能够从许多流行的关系数据库（包括 MySQL 和 Postgres）中提取数据。
小米 MySQL 数据实时同步到大数据数仓的架构与实践
2019-11-24 21:48

乔治大哥的博客 MySQL由于自身简单、高效、可靠的特点，成为小米内部使用最广泛的数据库，但是当数据量达到千万/亿级别的时候，MySQL的相关操作会变的非常迟缓；如果这时还有实时BI展示的需求，对于mysql来说是一种灾难。为了解决...
没有解决我的问题, 去提问

悬赏问题

¥15 统计大规模图中的完全子图问题
¥15 使用LM2596制作降压电路，一个能运行，一个不能
¥60 要数控稳压电源测试数据
¥15 能帮我写下这个编程吗
¥15 ikuai客户端l2tp协议链接报终止15信号和无法将p.p.p6转换为我的l2tp线路
¥15 经gamit解算的cors站数据再经globk网平差得到的坐标做形变分析
¥15 phython读取excel表格报错 ^7个 SyntaxError: invalid syntax 语句报错
¥20 @microsoft/fetch-event-source 流式响应问题
¥15 ogg dd trandata 报错
¥15 高缺失率数据如何选择填充方式

从MySQL数据库中删除重复的条目

4条回答 默认 最新

悬赏问题

4条回答默认最新