doutongfu9484 2014-11-28 21:34
浏览 261
已采纳

Golang从管道中读取大量数据

I'm trying to read an archive that's being tarred, streaming, to stdin, but I'm somehow reading far more data in the pipe than tar is sending.

I run my command like this:

tar -cf - somefolder | ./my-go-binary

The source code is like this:

package main

import (
    "bufio"
    "io"
    "log"
    "os"
)

// Read from standard input
func main() {
    reader := bufio.NewReader(os.Stdin)
    // Read all data from stdin, processing subsequent reads as chunks.
    parts := 0
    for {
        parts++
        data := make([]byte, 4<<20) // Read 4MB at a time
        _, err := reader.Read(data)
        if err == io.EOF {
            break
        } else if err != nil {
            log.Fatalf("Problems reading from input: %s", err)
        }
    }
    log.Printf("Total parts processed: %d
", parts)
}

For a 100MB tarred folder, I'm getting 1468 chunks of 4MB (that's 6.15GB)! Further, it doesn't seem to matter how large the data []byte array is: if I set the chunk size to 40MB, I still get ~1400 chunks of 40MB data, which makes no sense at all.

Is there something I need to do to read data from os.Stdin properly with Go?

  • 写回答

2条回答 默认 最新

  • doutui2016 2014-11-28 22:40
    关注

    Your code is inefficient. It's allocating and initializing data each time through the loop.

    for {
        data := make([]byte, 4<<20) // Read 4MB at a time
    }
    

    The code for your reader as an io.Reader is wrong. For example, you ignore the number of bytes read by _, err := reader.Read(data) and you don't handle err errors properly.

    Package io

    import "io" 
    

    type Reader

    type Reader interface {
            Read(p []byte) (n int, err error)
    }
    

    Reader is the interface that wraps the basic Read method.

    Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered. Even if Read returns n < len(p), it may use all of p as scratch space during the call. If some data is available but not len(p) bytes, Read conventionally returns what is available instead of waiting for more.

    When Read encounters an error or end-of-file condition after successfully reading n > 0 bytes, it returns the number of bytes read. It may return the (non-nil) error from the same call or return the error (and n == 0) from a subsequent call. An instance of this general case is that a Reader returning a non-zero number of bytes at the end of the input stream may return either err == EOF or err == nil. The next Read should return 0, EOF regardless.

    Callers should always process the n > 0 bytes returned before considering the error err. Doing so correctly handles I/O errors that happen after reading some bytes and also both of the allowed EOF behaviors.

    Implementations of Read are discouraged from returning a zero byte count with a nil error, except when len(p) == 0. Callers should treat a return of 0 and nil as indicating that nothing happened; in particular it does not indicate EOF.

    Implementations must not retain p.

    Here's a model file read program that conforms to the io.Reader interface:

    package main
    
    import (
        "bufio"
        "io"
        "log"
        "os"
    )
    
    func main() {
        nBytes, nChunks := int64(0), int64(0)
        r := bufio.NewReader(os.Stdin)
        buf := make([]byte, 0, 4*1024)
        for {
            n, err := r.Read(buf[:cap(buf)])
            buf = buf[:n]
            if n == 0 {
                if err == nil {
                    continue
                }
                if err == io.EOF {
                    break
                }
                log.Fatal(err)
            }
            nChunks++
            nBytes += int64(len(buf))
            // process buf
            if err != nil && err != io.EOF {
                log.Fatal(err)
            }
        }
        log.Println("Bytes:", nBytes, "Chunks:", nChunks)
    }
    

    Output:

    2014/11/29 10:00:05 Bytes: 5589891 Chunks: 1365
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥100 求购一套带接口实现实习自动签到打卡
  • ¥50 MacOS 使用虚拟机安装k8s
  • ¥500 亚马逊 COOKIE我如何才能实现 登录一个亚马逊账户 下发新 COOKIE ..我使用下发新COOKIE 导入ADS 指纹浏览器登录,我把账户密码 修改过后,原来下发新COOKIE 不会失效的方式
  • ¥20 玩游戏gpu和cpu利用率特别低,玩游戏卡顿
  • ¥25 oracle中的正则匹配
  • ¥15 关于#vscode#的问题:把软件卸载不会再出现蓝屏
  • ¥15 vimplus出现的错误
  • ¥15 usb无线网卡转typec口
  • ¥30 怎么使用AVL fire ESE软件自带的优化模式来优化设计Soot和NOx?
  • ¥15 Ubuntu20.04.4.LTS系统如何下载安装VirtualBox虚拟机?