doujieyu7062 2018-09-19 12:26 采纳率: 0%
浏览 78
已采纳

Go中的CSV解析器由于尾随空格而中断

We are trying to parse a csv file using Go's encoding/csv package. This particular csv is a bit peculiar, each row has a trailing space. When trying to decode this csv with quoted fields the package breaks since it expects a newline, separator or quote. The trailing space is not expected.

How would you handle this case? Do you know of another parser that we could use?

Edit:

f,err := os.Open("file.go")
// err etc..
csvr := csv.NewReader(f)
csvr.Comma = csvDelimiter
for {
   rowAsSlice, err := csvr.Read()
   // Handle row and errors etc.
}

Edit 2: CSV example, mind the trailing space!

"RECORD_TYPE","COMPANY_SHORTNAME" 
"HDR","COMPANY_EXAMPLE" 
  • 写回答

1条回答 默认 最新

  • dongwo6477 2018-09-19 16:33
    关注

    One possible solution is to wrap the source file reader in a custom reader whose Read(...) method silently trims trailing whitespace from what the underlying reader actually reads. The csv.Reader could use that type directly.

    For example (Go Playground):

    type TrimReader struct{ io.Reader }
    
    var trailingws = regexp.MustCompile(` +?
    `)
    
    func (tr TrimReader) Read(bs []byte) (int, error) {
      // Perform the requested read on the given reader.
      n, err := tr.Reader.Read(bs)
      if err != nil {
        return n, err
      }
    
      // Remove trailing whitespace from each line.
      lines := string(bs[:n])
      trimmed := []byte(trailingws.ReplaceAllString(lines, "
    "))
      copy(bs, trimmed)
      return len(trimmed), nil
    }
    
    func main() {
      file, err := file.Open("myfile.csv")
      // TODO: handle err...
    
      csvr := csv.NewReader(TrimReader{file})
    
      for {
        record, err := csvr.Read()
        if err == io.EOF {
          break
        }
        fmt.Printf("LINE: record=%#v, err=%v
    ", record, err)
      }
      // LINE: record=[]string{"RECORD_TYPE", "COMPANY_SHORTNAME"}, err=<nil>
      // LINE: record=[]string{"HDR", "COMPANY_EXAMPLE"}, err=<nil>
    }
    

    Note that, as commenter @svsd points out, there is a subtle bug here wherein trailing whitespace can still make it through if the line terminator isn't read until the subsequent call. You can workaround by buffering or, perhaps best, simply preprocess these CSV files to remove the trailing whitespace before attempting to parse them.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 汇编语言除法溢出问题
  • ¥65 C++实现删除N个数据列表共有的元素
  • ¥15 Visual Studio问题
  • ¥15 state显示变量是字符串形式,但是仍然红色,无法引用,并显示类型不匹配
  • ¥20 求一个html代码,有偿
  • ¥100 关于使用MATLAB中copularnd函数的问题
  • ¥20 在虚拟机的pycharm上
  • ¥15 jupyterthemes 设置完毕后没有效果
  • ¥15 matlab图像高斯低通滤波
  • ¥15 针对曲面部件的制孔路径规划,大家有什么思路吗