douweng1935 2017-01-21 09:38
浏览 24
已采纳

使用PHP,如果ID列重复,如何删除CSV行?

My goal is to with PHP, remove the entire row of a CSV-file if duplicate values appears in a certain column, in this example ID-column. I naturally want to keep the first row where the duplicated ID appears (see example below).

I don't want to create a new CSV-file, I want to open the file, remove what needs to be removed, and overwrite the current file.

I also want to store how many rows that has been removed in variable.

Input (notice duplicate ID of 3): file.csv

ID,Date,Name,Age
1,12/3/13,John Doe ,23
2,12/3/19,Jane Doe ,21
3,12/4/19,Jane Doe ,19
3,12/3/18,John Doe  ,33
4,12/3/19,Jane Doe ,21

Expected output: file.csv

ID,Date,Name,Age
1,12/3/13,John Doe ,23
2,12/3/19,Jane Doe ,21
3,12/4/19,Jane Doe ,19
4,12/3/19,Jane Doe ,21

And then also be able to: echo $removedRows; that will output: 1 How to accomplish this?


I've managed to get this in a new file, but I just want to overwrite the current file and i dont know why i got the " " around name column:

ID,Date,Name,Age
1,12/3/13,"John Doe ",23
2,12/3/19,"Jane Doe ",21
3,12/4/19,"Jane Doe ",19
4,12/3/19,"Jane Doe ",21

With the following code:

$input_filename = 'file.csv';

// Move the csv-file to 'newfile' directory
copy($input_filename, 'newfile/'.$input_filename);

$output_filename = 'newfile/'.$input_filename;

$input_file = fopen($input_filename, 'r');
$output_file = fopen($output_filename, 'w');

$IDs = array();

// Read the header
$headers = fgetcsv($input_file, 1000);
fputcsv($output_file, $headers);

// Flip it so it becomes name => ID
$headers = array_flip($headers);

// Read every row
while (($row = fgetcsv($input_file, 1000)) !== FALSE)
{
    $ID = $row[$headers['ID']];
    // Do we already have this ID?
    if (isset($IDs[$ID]))
        continue;

    // Mark this ID as being found
    $IDs[$ID] = true;
    // Write it to the output
    fputcsv($output_file, $row);
}
  • 写回答

1条回答 默认 最新

  • dongqu4443 2017-01-21 14:22
    关注

    Because you cannot read from a file and write to it at the same time, I advise you to write your data to another file and then move this file to your source file, something like:

    $input_filename = 'file.csv';
    $output_filename = 'newfile/' . $input_filename;
    
    // Copy the csv-file to 'newfile' directory
    copy($input_filename, $output_filename);
    $input_file = fopen($input_filename, 'r');
    $output_file = fopen($output_filename, 'w');
    
    $IDs = array();
    
    // Read the header
    $headers = fgetcsv($input_file, 1000);
    fputcsv($output_file, $headers);
    
    // Flip it so it becomes name => ID
    $headers = array_flip($headers);
    
    // Deleted rows counter
    $rows_deleted = 0;
    // Read every row
    while (($row = fgetcsv($input_file, 1000)) !== FALSE)
    {
        $ID = $row[$headers['ID']];
        // Do we already have this ID?
        if (isset($IDs[$ID])) {
            // row skipped - therefore it is deleted
            $rows_deleted++;
            continue;
        }
    
        // Mark this ID as being found
        $IDs[$ID] = true;
        // Write it to the output
        fputcsv($output_file, $row);
    }
    
    // Now we should move output file to input one
    rename($output_filename, $input_filename);
    
    echo "Deleted: " . $rows_deleted;
    

    As for " around your data - this is a result of fputcsv. This is for safety reasons. Imagine if your data will be not

    3,12/4/19,Jane Doe ,19
    

    but

    3,12/4/19,Jane, Doe ,19
    

    And you want to consider Jane, Doe as one element. That's why with " it's more clear for a parser how to treat your rows:

    3,12/4/19,"Jane, Doe ",19    // here `Jane, Doe` is one element
    

    Usually, wrapping data in " does not affect parsing generated csv. But if you're sure that you don't want quotes - you can pass more arguments to fputcsv, though I'm not sure if it will work with empty value as enclosure argument.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测