douren7921
2017-04-24 19:06
浏览 147
已采纳

如何在S3中保存数据流? aws-sdk-go示例不起作用?

I am trying to persist a given stream of data to an S3 compatible storage. The size is not known before the stream ends and can vary from 5MB to ~500GB.

I tried different possibilities but did not find a better solution than to implement sharding myself. My best guess is to make a buffer of a fixed size fill it with my stream and write it to the S3. Is there a better solution? Maybe a way where this is transparent to me, without writing the whole stream to memory?

The aws-sdk-go readme has an example programm that takes data from stdin and writes it to S3: https://github.com/aws/aws-sdk-go#using-the-go-sdk

When I try to pipe data in with a pipe | I get the following error: failed to upload object, SerializationError: failed to compute request body size caused by: seek /dev/stdin: illegal seek Am I doing something wrong or is the example not working as I expect it to?

I although tried minio-go, with PutObject() or client.PutObjectStreaming(). This is functional but consumes as much memory as the data to store.

  1. Is there a better solution?
  2. Is there a small example program that can pipe arbitrary data into S3?

图片转代码服务由CSDN问答提供 功能建议

我正在尝试将给定的数据流持久存储到与S3兼容的存储中。 大小未知 流结束,大小从5MB到500GB不等。

我尝试了各种可能性,但没有找到比实现自我分片更好的解决方案。 我最好的猜测是使固定大小的缓冲区充满我的流并将其写入S3。 有更好的解决方案吗? 也许这对我来说是透明的,而无需将整个流写入内存?

aws-sdk-go自述文件有一个示例程序,该程序从stdin获取数据并将其写入S3 : https://github.com/aws/aws-sdk- go#using-the-go-sdk

当我尝试使用管道 | 管道传输数据时,出现以下错误: 无法上传对象,SerializationError:无法计算请求主体大小 由:seek / dev / stdin:非法seek Am我做错了事或该示例未按预期工作

我虽然尝试了minio-go,但使用 PutObject() client.PutObjectStreaming()。 这是功能b ut消耗的内存与要存储的数据一样多。

  1. 是否有更好的解决方案?
  2. 是否有一个小的示例程序可以 通过管道将任意数据传输到S3?
  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 邀请回答

1条回答 默认 最新

  • dongsu2807 2017-04-24 23:55
    最佳回答

    You can use the sdk's Uploader to handle uploads of unknown size but you'll need to make the os.Stdin "unseekable" by wrapping it into an io.Reader. This is because the Uploader, while it requires only an io.Reader as the input body, under the hood it does a check to see whether the input body is also a Seeker and if it is, it does call Seek on it. And since os.Stdin is just an *os.File which implements the Seeker interface, by default, you would get the same error you got from PutObjectWithContext.

    The Uploader also allows you to upload the data in chunks whose size you can configure and you can also configure how many of those chunks should be uploaded concurrently.

    Here's a modified version of the linked example, stripped off of code that can remain unchanged.

    package main
    
    import (
        // ...
        "io"
        "github.com/aws/aws-sdk-go/service/s3/s3manager"
    )
    
    type reader struct {
        r io.Reader
    }
    
    func (r *reader) Read(p []byte) (int, error) {
        return r.r.Read(p)
    }
    
    func main() {
        // ... parse flags
    
        sess := session.Must(session.NewSession())
        uploader := s3manager.NewUploader(sess, func(u *s3manager.Uploader) {
            u.PartSize = 20 << 20 // 20MB
            // ... more configuration
        })
    
        // ... context stuff
    
        _, err := uploader.UploadWithContext(ctx, &s3manager.UploadInput{
            Bucket: aws.String(bucket),
            Key:    aws.String(key),
            Body:   &reader{os.Stdin},
        })
    
        // ... handle error
    }
    

    As to whether this is a better solution than minio-go I do not know, you'll have to test that yourself.

    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题