从S3同时下载多个文件并合并它们

Im trying to download multiple files from S3 concurrently ,and consolidate their contents into a bytes buffer.The files are csv formatted. My code seems to work most of time(8 from 10 tries).But there are instances that after i inspected the consolidated buffer, I've got less that what i should be getting(usually no more than 100 rows missing). Total number of records expected is 4802. If run my code sequentially this problem does not appear.But i need to use goroutines for the speed.This is a major requirement on what im trying to do.I have run the go data race inspector with no data races appear , and the error statements that i print never print out.

This is the code i use:

    var pingsBuffer = aws.NewWriteAtBuffer([]byte{}) 
        //range over the contents of the index file
    for _, file := range indexList {
        wg.Add(1)
        go download(key + string(file), pingsBuffer, &wg)
    }
    wg.Wait()

and the download functions (that also consolidates the downloaded files)

func download(key string, buffer *aws.WriteAtBuffer, wg *sync.WaitGroup)  {

defer wg.Done()

awsBuffer := aws.NewWriteAtBuffer([]byte{})

input := &s3.GetObjectInput {
    Bucket: aws.String(defaultLocationRootBucket),
    Key:    aws.String(key),
}

n1, downloadError := downloader.Download(awsBuffer, input)
if downloadError != nil {
    loglib.Log(loglib.LevelError, applicationType, fmt.Sprintf("Failed to download from S3, file(%v) with error : %v.", key, downloadError))
    return
}


lenghts3:= int64(len(buffer.Bytes()))

n2, bufferError := buffer.WriteAt(awsBuffer.Bytes(), lenghts3 )
if bufferError != nil {
    loglib.Log(loglib.LevelError, applicationType, fmt.Sprintf("Failed to write to buffer, the file(%v) downloaded from S3  with error : %v.", key, bufferError))
}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dpdfh60088 2017-06-26 17:18
关注
This code:

lenghts3:= int64(len(buffer.Bytes()))

Is a concurrency problem: two routines may get the length at the same time, getting the same start position, and both proceed to write to the buffer with the same start position, stepping on each other's toes.

Since you're already retrieving whole objects in memory and not streaming to the combined buffer, you may as well just send the full contents of each file on a channel, and have a receiver on that channel append each result to a shared byte buffer as they come in, synchronously.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

从S3同时下载多个文件并合并它们
2017-06-26 16:39

回答 1 已采纳 This code: lenghts3:= int64(len(buffer.Bytes())) Is a concurrency problem: two routines may get
使用S3 Golang SDK从S3下载选择性文件
2019-03-22 06:54

回答 1 已采纳 AWS SDK does not have this possibility. You can list objects in a bucket, filter the output based
使用s3上载多个文件
2017-12-08 17:55

回答 2 已采纳 Use the FileHeader.Open method to get an io.ReadSeeker. f, err := f.Open() if err != nil
java大文件上传，大文件下载解决方案
2023-08-08 12:44

java干几年，回到解放前的博客将大文件分成小块，分别上传，然后在服务器端将这些块合并成一个完整的文件。这适用于小文件，但不太适合大文件，因为需要将整个文件加载到内存中。大文件分片上传是一种常见的策略，它可以减少单次上传的负担，并且...
使用AWS开发工具包Go的完整URI从S3下载文件
2018-03-27 22:37

回答 4 已采纳 There is no way to do what you want. The only ways to get a private object are: Use the bucket a
如何使用Golang从公共S3存储桶下载
2019-02-06 13:55

回答 1 已采纳 We can set Credentials: credentials.AnonymousCredentials when creating session. Following is the w
如何从S3读取CSV文件
2018-09-28 03:22

回答 1 已采纳 As the error says: cannot use body (type []byte) as type io.Reader in argument to csv.NewRea
linux aws s3 cp,亚马逊网络服务-如何在AWS s3和AWS ec2之间传输文件
2021-05-16 03:42

weixin_39952502的博客我在下面列出了它们以及相关的安装和文档。S3CMD：([http://s3tools.org/s3cmd)]您可以通过apt-get install s3cmd轻松将其安装在debian / ubuntu上，然后从命令行运行。您可以将其合并到bash脚本或程序中。S3FS：...
从AWS S3紧急状态下载日志文件：运行时错误：
2015-12-17 22:38

回答 1 已采纳 You're passing nil to s3manager.NewDownloader where it requires a Session sess := session.New() m
从s3 bucket ajax php下载 ajax php
2018-10-23 22:00

回答 1 已采纳 You should enable cors on s3 bucket: Select bucket Select permissions tab Select CORS Enable CO
处理来自Amazon S3的多个文件下载？ php
2009-01-22 18:58

回答 2 已采纳 Have an EC2 instance to do this. Traffic between EC2 and S3 is free (and as internal traffic, much
为亚马逊S3提供SFTP连接
2021-07-23 18:02

知行EDI的博客许多组织寻求利用SFTP的简单性和安全性作为一种简单的文件传输机制，将数据从企业应用程序传输到Amazon S3。与全球数以千计的组织和团队一样，您可以使用S3作为中央存储库，将所有数据存储在一个地方，用于一系列.
如何使用表单数据直接将多个文件上传到amazon S3？ php
2015-08-05 03:29

回答 1 已采纳 The only way is to use multiform <form action ="" method="post" enctype="multipart"> <in
超详细分析S3DIS数据集的构建
2023-08-08 23:24

吃鱼不卡次的博客本文主要介绍了S3DIS数据集的结构以及pointnet++如何构建S3DIS的dataset，内容由浅入深，数形结合，非常容易理解掌握该数据集。
robobadger:自动化pgbadger从postgres生成报告
2021-05-23 01:25

由于AWS每小时生成一个日志，因此它将合并日志并生成有关它们的pgbadger报告。然后在我的詹金斯（Jenkins）工作中，将报告转储到S3中。我还没有弄清楚其余的。当然，您将需要AWS密钥才能实际访问有问题的数据库...
没有解决我的问题, 去提问

悬赏问题

¥20 给自己本科IT专业毕业的妹m找个实习工作
¥15 用友U8：向一个无法连接的网络尝试了一个套接字操作，如何解决？
¥30 我的代码按理说完成了模型的搭建、训练、验证测试等工作(标签-网络|关键词-变化检测)
¥50 mac mini外接显示器画质字体模糊
¥15 TLS1.2协议通信解密
¥40 图书信息管理系统程序编写
¥20 Qcustomplot缩小曲线形状问题
¥15 企业资源规划ERP沙盘模拟
¥15 树莓派控制机械臂传输命令报错，显示摄像头不存在
¥15 前端echarts坐标轴问题

从S3同时下载多个文件并合并它们

1条回答 默认 最新

悬赏问题

1条回答默认最新