douren7921 2017-04-24 19:06
浏览 155
已采纳

如何在S3中保存数据流? aws-sdk-go示例不起作用?

I am trying to persist a given stream of data to an S3 compatible storage. The size is not known before the stream ends and can vary from 5MB to ~500GB.

I tried different possibilities but did not find a better solution than to implement sharding myself. My best guess is to make a buffer of a fixed size fill it with my stream and write it to the S3. Is there a better solution? Maybe a way where this is transparent to me, without writing the whole stream to memory?

The aws-sdk-go readme has an example programm that takes data from stdin and writes it to S3: https://github.com/aws/aws-sdk-go#using-the-go-sdk

When I try to pipe data in with a pipe | I get the following error: failed to upload object, SerializationError: failed to compute request body size caused by: seek /dev/stdin: illegal seek Am I doing something wrong or is the example not working as I expect it to?

I although tried minio-go, with PutObject() or client.PutObjectStreaming(). This is functional but consumes as much memory as the data to store.

  1. Is there a better solution?
  2. Is there a small example program that can pipe arbitrary data into S3?
  • 写回答

1条回答 默认 最新

  • dongsu2807 2017-04-24 23:55
    关注

    You can use the sdk's Uploader to handle uploads of unknown size but you'll need to make the os.Stdin "unseekable" by wrapping it into an io.Reader. This is because the Uploader, while it requires only an io.Reader as the input body, under the hood it does a check to see whether the input body is also a Seeker and if it is, it does call Seek on it. And since os.Stdin is just an *os.File which implements the Seeker interface, by default, you would get the same error you got from PutObjectWithContext.

    The Uploader also allows you to upload the data in chunks whose size you can configure and you can also configure how many of those chunks should be uploaded concurrently.

    Here's a modified version of the linked example, stripped off of code that can remain unchanged.

    package main
    
    import (
        // ...
        "io"
        "github.com/aws/aws-sdk-go/service/s3/s3manager"
    )
    
    type reader struct {
        r io.Reader
    }
    
    func (r *reader) Read(p []byte) (int, error) {
        return r.r.Read(p)
    }
    
    func main() {
        // ... parse flags
    
        sess := session.Must(session.NewSession())
        uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
            u.PartSize = 20 << 20 // 20MB
            // ... more configuration
        })
    
        // ... context stuff
    
        _, err := uploader.UploadWithContext(ctx, &s3manager.UploadInput{
            Bucket: aws.String(bucket),
            Key:    aws.String(key),
            Body:   &reader{os.Stdin},
        })
    
        // ... handle error
    }
    

    As to whether this is a better solution than minio-go I do not know, you'll have to test that yourself.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 Arcgis相交分析无法绘制一个或多个图形
  • ¥15 seatunnel-web使用SQL组件时候后台报错,无法找到表格
  • ¥15 fpga自动售货机数码管(相关搜索:数字时钟)
  • ¥15 用前端向数据库插入数据,通过debug发现数据能走到后端,但是放行之后就会提示错误
  • ¥30 3天&7天&&15天&销量如何统计同一行
  • ¥30 帮我写一段可以读取LD2450数据并计算距离的Arduino代码
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)