douou1872 2014-01-17 10:50
浏览 1212
已采纳

在CSV中查找重复的列值

I'm importing a CSV that has 3 columns, one of these columns could have duplicate records.

I have 2 things to check:

1. The field 'NAME' is not null and is a string
2. The field 'ID' is unique

So far, I'm parsing the CSV file, once and checking that 1. (NAME is valid), which if it fails, it simply breaks out of the while loop and stops.

I guess the question is, how I'd check that ID is unique?

I have fields like the following:

NAME,  ID,
Bob,   1,
Tom,   2,
James, 1,
Terry, 3,
Joe,   4,

This would output something like `Duplicate ID on line 3'

Thanks

P.S this CSV file has more columns and can have around 100,000 records. I have simplified it for a specific reason to solve the duplicate column/field

Thanks

  • 写回答

4条回答

  • dou4121 2014-01-17 10:59
    关注

    I went assuming a certain type of design, as stripped out the CSV part, but the idea will remain the same :

    <?php
      /* Let's make an array of 100,000 rows (Be careful, you might run into memory issues with this, issues you won't have with a CSV read line by line)*/
      $arr = [];
      for ($i = 0; $i < 100000; $i++)
        $arr[] = [rand(0, 1000000), 'Hey'];
    
      /* Now let's have fun */
      $ids = [];
      foreach ($arr as $line => $couple) {
        if ($ids[$couple[0]])
          echo "Id " . $couple[0] . " on line " . $line . " already used<br />";
        else
          $ids[$couple[0]] = true;
      }
    ?>
    

    100, 000 rows aren't that much, this will be enough. (It ran in 3 seconds at my place.)

    EDIT: As pointed out, in_array is less efficient than key lookup. I've updated my code consequently.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥50 永磁型步进电机PID算法
  • ¥15 sqlite 附加(attach database)加密数据库时,返回26是什么原因呢?
  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥200 uniapp长期运行卡死问题解决
  • ¥15 latex怎么处理论文引理引用参考文献
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?