u014394665
巨蟹糖朵
采纳率0%
2019-02-20 14:42

mysql百万数据查重优化

在mysql数据库中有一张人员信息表,数据量 100万以上,根据姓名、性别、身份证号等基本信息查询出重复的数据。如何能快速查出数据。
图片说明
求教如何优化。

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

2条回答

  • jess0018 Jess鼎 2年前
    方案1:
    select t.id, t.name, t.ssid from t_case_info t join (
      select id, name, ssid from t_case_info where is_del='0' and reportdate >= '2003-01-01' and reportdate <= '2003-01-31' group by name,ssid having count(1) > 1
    ) r on t.name = r.name and t.ssid = r.ssid 
    where t.is_del='0' and t.reportdate >= '2003-01-01' and t.reportdate <= '2003-01-01';
    
    方案2:
    1、创建临时表,把重复的name和ssid存入临时表
    CREATE TEMPORARY TABLE r_info
    (
      id varchar(32) NOT NULL,
      name varchar(255),
      ssid varchar(25)
      KEY idx(ssid, name)
    )ENGINE=MEMORY DEFAULT CHARSET=utf8 ;
    2、关联查询
    insert into r_info(id, name, ssid) select id, name, ssid from t_case_info where is_del='0' and reportdate >= '2003-01-01' and reportdate <= '2003-01-31' group by name,ssid having count(1) > 1;
    select t.id, t.name, t.ssid from t_case_info t join r_info r on t.name = r.name and t.ssid = r.ssid 
    where t.is_del='0' and t.reportdate >= '2003-01-01' and t.reportdate <= '2003-01-01';
    
    点赞 1 评论 复制链接分享
  • yangxingzou zoyation 2年前

    可以考虑加个字段mdv,字段是‘所有要验证重复字段加起来’的MD5值之类的
    然后给这个字段建立索引
    然后select * from t_case_info a where exists(select 1 from t_case_info b where a.mdv=b.mdv and a.id<>b.id)

    点赞 1 评论 复制链接分享

相关推荐