douzhang8033 2018-11-11 22:39
浏览 7

从S3下载时CSV编码损坏

I'm trying to download a CSV file from S3 using golang's SDK but it comes out encoded wrongly and is interpreted as one slice.

input := &s3.GetObjectInput{
    Bucket:                  aws.String(bucket),
    Key:                     aws.String(key),
    ResponseContentType:     aws.String("text/csv"),
    ResponseContentEncoding: aws.String("utf-8"),
}

object, err := s3.New(s).GetObject(input)
if err != nil {
    var obj s3.GetObjectOutput

    return &obj, err
}

defer object.Body.Close()

lines, err := csv.NewReader(object.Body).ReadAll()
if err != nil {
    log.Fatal(err)
}

log.Printf("%q", lines[0])


// returns ["\ufeffH1" "H2" "field1" "field2" "field1" field200602"]

I'm guessing this is incorrect character encoding. Problem is that I'm not clear what encoding that it is. When I'm putting the file, I'm specifying csv.

I would have expected to see [][]string:

[
  [],
  []
]

Any advice?

Approach 2

buffer := new(bytes.Buffer)
buffer.ReadFrom(object.Body)

str := buffer.String()

lines, err := csv.NewReader(strings.NewReader(str)).ReadAll()
if err != nil {
    log.Fatal(err)
}

log.Printf("length: %v", len(lines))
// still one line

Approach 3

My new approach is going to be manually removing byte sequences that are problematic. This is pretty terrible. Godocs on this need work.

This is closer but now I have to split out on new lines then again on commas.

Edit When I print out the bytes it looks like: "\ufeffH1,H2,field1,field2

I have tried using the following encodings:

utf-8, iso-8859-1, iso-8859-1:utf-8

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥20 测距传感器数据手册i2c
    • ¥15 RPA正常跑,cmd输入cookies跑不出来
    • ¥15 求帮我调试一下freefem代码
    • ¥15 matlab代码解决,怎么运行
    • ¥15 R语言Rstudio突然无法启动
    • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
    • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
    • ¥15 用windows做服务的同志有吗
    • ¥60 求一个简单的网页(标签-安全|关键词-上传)
    • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法