douou1872 2014-01-17 10:50
浏览 1213
已采纳

在CSV中查找重复的列值

I'm importing a CSV that has 3 columns, one of these columns could have duplicate records.

I have 2 things to check:

1. The field 'NAME' is not null and is a string
2. The field 'ID' is unique

So far, I'm parsing the CSV file, once and checking that 1. (NAME is valid), which if it fails, it simply breaks out of the while loop and stops.

I guess the question is, how I'd check that ID is unique?

I have fields like the following:

NAME,  ID,
Bob,   1,
Tom,   2,
James, 1,
Terry, 3,
Joe,   4,

This would output something like `Duplicate ID on line 3'

Thanks

P.S this CSV file has more columns and can have around 100,000 records. I have simplified it for a specific reason to solve the duplicate column/field

Thanks

  • 写回答

4条回答 默认 最新

  • dou4121 2014-01-17 10:59
    关注

    I went assuming a certain type of design, as stripped out the CSV part, but the idea will remain the same :

    <?php
      /* Let's make an array of 100,000 rows (Be careful, you might run into memory issues with this, issues you won't have with a CSV read line by line)*/
      $arr = [];
      for ($i = 0; $i < 100000; $i++)
        $arr[] = [rand(0, 1000000), 'Hey'];
    
      /* Now let's have fun */
      $ids = [];
      foreach ($arr as $line => $couple) {
        if ($ids[$couple[0]])
          echo "Id " . $couple[0] . " on line " . $line . " already used<br />";
        else
          $ids[$couple[0]] = true;
      }
    ?>
    

    100, 000 rows aren't that much, this will be enough. (It ran in 3 seconds at my place.)

    EDIT: As pointed out, in_array is less efficient than key lookup. I've updated my code consequently.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 R语言卸载之后无法重装,显示电脑存在下载某些较大二进制文件行为,怎么办
  • ¥15 java 的protected权限 ,问题在注释里