dongpao1905 2016-03-25 21:53
浏览 179

GoLang:在goroutine上解压缩bz2,在其他goroutine中使用

I am a new-grad SWE learning Go (and loving it).

I am building a parser for Wikipedia dump files - basically a huge bzip2-compressed XML file (~50GB uncompressed).

I want to do both streaming decompression and parsing, which sounds simple enough. For decompression, I do:

inputFilePath := flag.Arg(0) inputReader := bzip2.NewReader(inputFile)

And then pass the reader to the XML parser:

decoder := xml.NewDecoder(inputFile)

However, since both decompressing and parsing are expensive operations, I would like to have them run on separate Go routines to make use of additional cores. How would I go about doing this in Go?

The only thing I can think of is wrapping the file in a chan []byte, and implementing the io.Reader interface, but I presume there might be a built way (and cleaner) way of doing it.

Has anyone ever done something like this?

Thanks! Manuel

  • 写回答

2条回答 默认 最新

  • doukanwen4114 2016-03-25 22:46
    关注

    You can use io.Pipe, then use io.Copy to push the decompressed data into the pipe, and read it in another goroutine:

    package main
    
    import (
        "bytes"
        "encoding/json"
        "fmt"
        "io"
        "sync"
    )
    
    func main() {
    
        rawJson := []byte(`{
                "Foo": {
                    "Bar": "Baz"
                }
            }`)
    
        bzip2Reader := bytes.NewReader(rawJson) // this stands in for the bzip2.NewReader
    
        var wg sync.WaitGroup
        wg.Add(2)
    
        r, w := io.Pipe()
    
        go func() {
            // write everything into the pipe. Decompression happens in this goroutine.
            io.Copy(w, bzip2Reader)
            w.Close()
            wg.Done()
        }()
    
        decoder := json.NewDecoder(r)
    
        go func() {
            for {
                t, err := decoder.Token()
                if err != nil {
                    break
                }
                fmt.Println(t)
            }
            wg.Done()
        }()
    
        wg.Wait()
    }
    

    http://play.golang.org/p/fXLnfnaWYA

    评论

报告相同问题?

悬赏问题

  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效
  • ¥15 悬赏!微信开发者工具报错,求帮改
  • ¥20 wireshark抓不到vlan