如何在AWS开发工具包中实现AWS CLI Sync Command的性能

The aws s3 sync command in the CLI can download a large collection of files very quickly, and I can not achieve the same performance with the AWS Go SDK. I have millions of files in the bucket so this is critical to me. I need to use the list pages command as well so that I can add a prefix which is not supported well by the sync CLI command.

I have tried using multiple goroutines (10 up to 1000) to make requests to the server, but the time is just so much slower compared to the CLI. It takes about 100 ms per file to run the Go GetObject function which is unacceptable for the number of files that I have. I know that the AWS CLI also uses the Python SDK in the backend, so how does it have so much better performance (I tried my script in boto as well as Go).

I am using ListObjectsV2Pages and GetObject. My region is the same as the S3 server's.

    logMtx := &sync.Mutex{}
    logBuf := bytes.NewBuffer(make([]byte, 0, 100000000))

    err = s3c.ListObjectsV2Pages(
        &s3.ListObjectsV2Input{
            Bucket:  bucket,
            Prefix:  aws.String("2019-07-21-01"),
            MaxKeys: aws.Int64(1000),
        },
        func(page *s3.ListObjectsV2Output, lastPage bool) bool {
            fmt.Println("Received", len(page.Contents), "objects in page")
            worker := make(chan bool, 10)
            for i := 0; i < cap(worker); i++ {
                worker <- true
            }
            wg := &sync.WaitGroup{}
            wg.Add(len(page.Contents))
            objIdx := 0
            objIdxMtx := sync.Mutex{}
            for {
                <-worker
                objIdxMtx.Lock()
                if objIdx == len(page.Contents) {
                    break
                }
                go func(idx int, obj *s3.Object) {
                    gs := time.Now()
                    resp, err := s3c.GetObject(&s3.GetObjectInput{
                        Bucket: bucket,
                        Key:    obj.Key,
                    })
                    check(err)
                    fmt.Println("Get: ", time.Since(gs))

                    rs := time.Now()
                    logMtx.Lock()
                    _, err = logBuf.ReadFrom(resp.Body)
                    check(err)
                    logMtx.Unlock()
                    fmt.Println("Read: ", time.Since(rs))

                    err = resp.Body.Close()
                    check(err)
                    worker <- true
                    wg.Done()
                }(objIdx, page.Contents[objIdx])
                objIdx += 1
                objIdxMtx.Unlock()
            }
            fmt.Println("ok")
            wg.Wait()
            return true
        },
    )
    check(err)

Many results look like:

Get:  153.380727ms
Read:  51.562µs

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doutonghang2761 2019-07-31 17:57
关注
I ended up settling for my script in the initial post. I tried 20 goroutines and that seemed to work pretty well. On my laptop, the initial script is definitely slower than the command line (i7 8-thread, 16 GB RAM, NVME) versus the CLI. However, on the EC2 instance, the difference was small enough that it was not worth my time to optimize it further. I used a c5.xlarge instance in the same region as the S3 server.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

如何下载使用awscli
2025-04-22 17:29

农场主John的博客如何下载使用awscli
aws CLI使用
2024-09-26 16:54

Ariswei的博客官方网站：https://docs.aws.amazon.com/cli/latest/reference/s3/AWS 命令行界面 (CLI) 是用于管理 AWS 服务的统一工具。只通过一个工具进行下载和配置，您可以使用命令行控制多个 AWS 服务并利用脚本来自动执行...
aws-cli-1.16.298.zip
2019-12-09 15:54

AWS CLI（Amazon Web Services Command Line Interface）是亚马逊官方推出的一款强大的命令行工具，它允许用户通过简单的命令操作各种AWS服务，包括S3、EC2、RDS、Lambda等。此压缩包“aws-cli-1.16.298.zip”包含的...
AWS-CLI
2021-03-27 18:54

pip install awscli --upgrade --user ``` 安装完成后，配置AWS凭据是必要的。这通常涉及设置访问密钥ID和秘密访问密钥。使用`aws configure`命令，按提示输入： ```bash aws configure AWS Access Key ID [None]:...
端到端 AWS 定量分析：使用 AWS 和 AWSCLI 自动运行脚本
2024-08-08 21:03

数云界的博客使用 AWSCLI 启动、运行和关闭 AWS 服务器添加图片注释，不超过 140 字（可选）欢迎来到雲闪世界。我们开发了两个 Python 脚本；一个用于为我们获取数据，另一个用于使用 sklearn 的决策树分类器处理数据。...
彻底解决！AWS CLI S3同步命令与目录桶兼容性实战指南
2025-09-10 20:29

柏珂卿的博客作为Amazon Web Services (AWS)的命令行界面工具（CLI），AWS CLI提供了强大的对象存储服务S3的管理能力，但在处理复杂目录结构时，`sync`命令与S3 bucket（存储桶）的兼容性问题却常常困扰着开发者和运维人员。...
使用 AWS CLI 来快速使用Amazon 提供的 S3、EMR、ES 等服务
2020-03-04 22:24

独家雨天的博客安装 AWS CLI 工具安装条件：Python 2 version 2.7+ or ...pip3 install -U --user awscli aws_role_credentials oktaauth # -U （update）表示更新所有的包到最新 # --user 表示安装到用户目录下，例如 ~/.local ...
1、AWS 工具设置指南
2025-07-25 18:40

kk1234的博客本文介绍了云计算时代系统管理员角色的转变，并详细讲解了AWS工具的设置指南，包括安装AWS命令行界面（AWS CLI）、使用jq解析JSON输出以及安装早期的AWS命令行工具。通过这些工具，用户可以更高效地管理和自动化AWS...
AWS CLI自动化：从手动操作到企业级脚本开发实战
2025-09-07 14:31

陆或愉的博客在DevOps实践中，AWS资源的管理往往陷入重复手动操作的泥潭：每天执行20+条CLI命令、复制粘贴凭证信息、在多个账号间切换导致配置混乱、脚本版本混乱难以追溯。据AWS DevOps调查报告显示，73%的故障源于手动操作失误...
AWS S3上传下载
2024-02-27 15:36

cn_lyg的博客 aws s3 上传下载
没有解决我的问题, 去提问

如何在AWS开发工具包中实现AWS CLI Sync Command的性能

2条回答 默认 最新

2条回答默认最新