I'm trying to download a CSV file from S3 using golang's SDK but it comes out encoded wrongly and is interpreted as one slice.
input := &s3.GetObjectInput{
Bucket: aws.String(bucket),
Key: aws.String(key),
ResponseContentType: aws.String("text/csv"),
ResponseContentEncoding: aws.String("utf-8"),
}
object, err := s3.New(s).GetObject(input)
if err != nil {
var obj s3.GetObjectOutput
return &obj, err
}
defer object.Body.Close()
lines, err := csv.NewReader(object.Body).ReadAll()
if err != nil {
log.Fatal(err)
}
log.Printf("%q", lines[0])
// returns ["\ufeffH1" "H2" "field1" "field2" "field1" field200602"]
I'm guessing this is incorrect character encoding. Problem is that I'm not clear what encoding that it is. When I'm putting the file, I'm specifying csv.
I would have expected to see [][]string
:
[
[],
[]
]
Any advice?
Approach 2
buffer := new(bytes.Buffer)
buffer.ReadFrom(object.Body)
str := buffer.String()
lines, err := csv.NewReader(strings.NewReader(str)).ReadAll()
if err != nil {
log.Fatal(err)
}
log.Printf("length: %v", len(lines))
// still one line
Approach 3
My new approach is going to be manually removing byte sequences that are problematic. This is pretty terrible. Godocs on this need work.
This is closer but now I have to split out on new lines then again on commas.
Edit
When I print out the bytes it looks like:
"\ufeffH1,H2,field1,field2
I have tried using the following encodings:
utf-8
, iso-8859-1
, iso-8859-1:utf-8