douqian3712 2008-10-09 12:42
浏览 59
已采纳

使用CSV文件上的PHP替换或删除新行,但仅在单引号或双引号之间

I have a CSV file that holds about 200,000 - 300,000 records. Most of the records can be separated and inserted into a MySQL database with a simple

$line = explode("
", $fileData);

and then the values separated with

$lineValues = explode(',', $line);

and then inserted into the database using the proper data type i.e int, float, string, text, etc.

However, some of the records have a text column that includes a in the string. Which breaks when using the $line = explode(" ", $fileData); method. Each line of data that needs to be inserted into the database has approximately 216 columns. not every line has a record with a in the string. However, each time a is found in the line it is enclosed between a pair of single quotes (')

each line is set up in the following format:

id,data,data,data,text,more data

example:

1,0,0,0,'Hello World,0
2,0,0,0,'Hello
    World',0
3,0,0,0,'Hi',0
4,0,0,0,,0

As you can see from the example, most records can be easily split with the methods shown above. Its the second record in the example that causes the problem.

New lines are only and the file does not include in the file at all.

  • 写回答

5条回答 默认 最新

  • duane9322 2008-10-09 12:48
    关注

    If the csv data is in a file, you can just use fgetcsv() as others have pointed out. fgetcsv handles embedded newlines correctly.

    However if your csv data is in a string (like $fileData in your example) the following method may be useful as str_getcsv() only works on a row at a time and cannot split a whole file into records.

    You can detect the embedded newlines by counting the quotes in each line. If there are an odd number of quotes, you have an incomplete line, so concatenate this line with the following line. Once you have an even number of quotes, you have a complete record.

    Once you have a complete record, split it at the quotes (again using explode()). Odd-numbered fields are quoted (thus embedded commas are not special), even-numbered fields are not.

    Example:

    # Split file into physical lines (records may span lines)
    $lines = explode("
    ", $fileData);
    
    # Re-assemble records
    $records = array ();
    $record = '';
    $lineSep = '';
    foreach ($lines as $line) {
      # Escape @ symbol so we can use it as a marker (as it does not conflict with
      # any special CSV character.)
      $line = str_replace('@', '@a', $line);
    
      # Escape commas as we don't yet know which ones are separators
      $line = str_replace(',', '@c', $line);
    
      # Escape quotes in a form that uses no special characters
      $line = str_replace("\\'", '@q', $line);
      $line = str_replace('\\', '@b', $line);
    
      $record .= $lineSep . $line;
      $lineSep = "
    ";
    
      # Must have an even number of quotes in a complete record!
      if (substr_count($record, "'") % 2 == 0) {
        $records[] = $record;
        $record = '';
        $lineSep = '';
      }
    }
    if (strlen($record) > 0) {
      $records[] = $record;
    }
    
    $rows = array ();
    
    foreach ($records as $record) {
      $chunks_in = explode("'", $record);
      $chunks_out = array ();
    
      # Decode escaped quotes/backslashes.
      # Decode field-separating commas (unless quoted)
      foreach ($chunks_in as $i => $chunk) {
        # Unescape quotes & backslashes
        $chunk = str_replace('@q', "'", $chunk);
        $chunk = str_replace('@b', '\\', $chunk);
        if ($i % 2 == 0) {
          # Unescape commas
          $chunk = str_replace('@c', ',', $chunk);
        }
        $chunks_out[] = $chunk;
      }
    
      # Join back together, discarding unescaped quotes
      $record = join('', $chunks_out);
    
      $chunks_in = explode(',', $record);
      $row = array ();
      foreach ($chunks_in as $chunk) {
        $chunk = str_replace('@c', ',', $chunk);
        $chunk = str_replace('@a', '@', $chunk);
        $row[] = $chunk;
      }
      $rows[] = $row;
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥20 给自己本科IT专业毕业的妹m找个实习工作
  • ¥15 用友U8:向一个无法连接的网络尝试了一个套接字操作,如何解决?
  • ¥30 我的代码按理说完成了模型的搭建、训练、验证测试等工作(标签-网络|关键词-变化检测)
  • ¥50 mac mini外接显示器 画质字体模糊
  • ¥15 TLS1.2协议通信解密
  • ¥40 图书信息管理系统程序编写
  • ¥20 Qcustomplot缩小曲线形状问题
  • ¥15 企业资源规划ERP沙盘模拟
  • ¥15 树莓派控制机械臂传输命令报错,显示摄像头不存在
  • ¥15 前端echarts坐标轴问题