dongque1958 2018-04-17 00:50
浏览 4
已采纳

同时读取文件中的字节

I've written a program in Go that reads a single byte from a file and checks to see which bits are set. These files are usually pretty large (around 10 - 100 GB), so I don't want to read the entire file into memory. The program normally has to check millions of separate bytes.

Right now, the way I'm performing these reads is by using os.File.ReadAt(). This ended up being pretty slow, so I tried to use Goroutines to speed it up. For example:

var wg sync.WaitGroup
threadCount := 8

for i := 0; i < threadCount; i += 1 {
    wg.Add(1)
    go func(id int) {
        defer wg.Done()
        index := id
        myByte := make([]byte, 1)

        for index < numBytesInFile-1 {  // Stop when thread would attempt to read byte outside of file
            fmt.Println(file.ReadAt(myByte, index))
            index += threadCount
        }
    }(i)
}
wg.Wait()

However, using Goroutines here didn't speed the program up at all (in fact, it made it slightly slower due to overhead). I would have thought that files on the disc could be read concurrently as long as they are opened in read-only mode (which I do in my program). Is what I'm asking for impossible, or is there some way I make concurrent reads to a file in Go?

  • 写回答

1条回答 默认 最新

  • doucang8303 2018-04-18 06:27
    关注

    You slowness is because of I/O and not CPU. Adding more threads will not speed up your program. Read about Amdahl's law. https://en.wikipedia.org/wiki/Amdahl%27s_law

    If you do not want to read the full file into memory, you could either use a buffered reader and read in parts https://golang.org/pkg/bufio/#NewReader or you could even consider using the experimental memory-mapped files package too: https://godoc.org/golang.org/x/exp/mmap

    To know more about memory mapped files, see https://en.wikipedia.org/wiki/Memory-mapped_file

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 测距传感器数据手册i2c
  • ¥15 RPA正常跑,cmd输入cookies跑不出来
  • ¥15 求帮我调试一下freefem代码
  • ¥15 matlab代码解决,怎么运行
  • ¥15 R语言Rstudio突然无法启动
  • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
  • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法