douou1872 2014-01-17 10:50
浏览 1212
已采纳

在CSV中查找重复的列值

I'm importing a CSV that has 3 columns, one of these columns could have duplicate records.

I have 2 things to check:

1. The field 'NAME' is not null and is a string
2. The field 'ID' is unique

So far, I'm parsing the CSV file, once and checking that 1. (NAME is valid), which if it fails, it simply breaks out of the while loop and stops.

I guess the question is, how I'd check that ID is unique?

I have fields like the following:

NAME,  ID,
Bob,   1,
Tom,   2,
James, 1,
Terry, 3,
Joe,   4,

This would output something like `Duplicate ID on line 3'

Thanks

P.S this CSV file has more columns and can have around 100,000 records. I have simplified it for a specific reason to solve the duplicate column/field

Thanks

  • 写回答

4条回答 默认 最新

  • dou4121 2014-01-17 10:59
    关注

    I went assuming a certain type of design, as stripped out the CSV part, but the idea will remain the same :

    <?php
      /* Let's make an array of 100,000 rows (Be careful, you might run into memory issues with this, issues you won't have with a CSV read line by line)*/
      $arr = [];
      for ($i = 0; $i < 100000; $i++)
        $arr[] = [rand(0, 1000000), 'Hey'];
    
      /* Now let's have fun */
      $ids = [];
      foreach ($arr as $line => $couple) {
        if ($ids[$couple[0]])
          echo "Id " . $couple[0] . " on line " . $line . " already used<br />";
        else
          $ids[$couple[0]] = true;
      }
    ?>
    

    100, 000 rows aren't that much, this will be enough. (It ran in 3 seconds at my place.)

    EDIT: As pointed out, in_array is less efficient than key lookup. I've updated my code consequently.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料