dongwang788787 2019-01-03 19:30
浏览 50
已采纳

如何在golang中读取带有小ram的大文件? [关闭]

I have such document and I want to read each files in loop which is 5GB of size, I tried some way such as (file, err := ioutil.ReadFile(filename)) but it's loads the entire file into memory.I used this func for load files:

func visit(files *[]string) filepath.WalkFunc {
    return func(path string, info os.FileInfo, err error) error {
        if err != nil {
            log.Fatal(err)
        }
        *files = append(*files, path)
        return nil
    }
}

and for read files I used:

file, err := os.Open("file")
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()
    buf := make([]byte,10*1024)
    for {
        n, err := file.Read(buf)
        if n > 0 {
            fmt.Print(buf[:n])
        }
        if err == io.EOF {
            break
        }

I want to parse data from buf

err = xml.Unmarshal(buf, &m)
if err != nil {
    log.Fatal(err)
}
fmt.Println(m)

m is:

type M struct {
Mc []struct {
Id string `xml:"id"`
NeId string `xml:"neid"`}`xml:"mc"`
Mr struct {
Mh  []string `xml:"mh"`}`xml:"mr"`
}

and in func main:

func main() {
    var files []string
    root := "/folder/files"
    err := filepath.Walk(root, visit(&files))
    if err != nil {
        panic(err)
    }   
    for _, file := range files {

but it takes too long time to execute, what should I do to fast this process? I get an error XML syntax error on line 496: unexpected EOF. concurrency may be useful in this case?

  • 写回答

1条回答 默认 最新

  • duangutian1426 2019-01-03 23:56
    关注

    Here are some reproducible benchmark results:

    SSD:

    $ echo 3 | sudo tee /proc/sys/vm/drop_caches
    3
    $ go build readfile.go && time ./readfile
    /home/peter/Downloads/ubuntu-mate-18.10-desktop-amd64.iso is 2103607296 bytes
    real    0m2.839s
    user    0m0.283s
    sys     0m1.064s
    $ 
    

    HDD:

    $ echo 3 | sudo tee /proc/sys/vm/drop_caches
    3
    $ go build readfile.go && time ./readfile
    /home/peter/Downloads/ubuntu-mate-18.10-desktop-amd64.iso is 2103607296 bytes
    real    0m14.194s
    user    0m0.627s
    sys     0m2.880s
    $ 
    

    HDD:

    $ echo 3 | sudo tee /proc/sys/vm/drop_caches
    3
    $ go build readfile.go && time ./readfile
    /home/peter/Downloads/ubuntu-mate-18.10-desktop-amd64.iso is 2103607296 bytes
    real    0m16.627s
    user    0m0.431s
    sys     0m1.608s
    $ 
    

    package main
    
    import (
        "bufio"
        "fmt"
        "io"
        "os"
    )
    
    func readFile(fName string) (int64, error) {
        f, err := os.Open(fName)
        if err != nil {
            return 0, err
        }
        defer f.Close()
        r := bufio.NewReader(f)
    
        nr := int64(0)
        buf := make([]byte, 0, 4*1024)
        for {
            n, err := r.Read(buf[:cap(buf)])
            buf = buf[:n]
            if n == 0 {
                if err == nil {
                    continue
                }
                if err == io.EOF {
                    break
                }
                return nr, err
            }
    
            // Do something with buf
            nr += int64(len(buf))
    
            if err != nil && err != io.EOF {
                return nr, err
            }
        }
        return nr, nil
    }
    
    func main() {
        fName := `/home/peter/Downloads/ubuntu-mate-18.10-desktop-amd64.iso`
        if len(os.Args) > 1 {
            fName = os.Args[1]
        }
        nr, err := readFile(fName)
        if err != nil {
            fmt.Fprintln(os.Stderr, err)
            return
        }
        fmt.Printf("%s is %d bytes
    ", fName, nr)
    }
    

    What are your reproducible benchmark results?

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 使用C#,asp.net读取Excel文件并保存到Oracle数据库
  • ¥15 C# datagridview 单元格显示进度及值
  • ¥15 thinkphp6配合social login单点登录问题
  • ¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场 部分对应不上
  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 虚心请教几个问题,小生先有礼了
  • ¥30 截图中的mathematics程序转换成matlab