doudu7626 2016-08-11 16:52
浏览 789
已采纳

Golang中的bufio.NewScanner是否读取内存中的整个文件,而不是每个读取一行?

I was trying to read a file line by line with the following function using bufio.NewScanner.

func TailFromStart(fd *os.File, wg *sync.WaitGroup)  {

    fd.Seek(0,0)
    scanner := bufio.NewScanner(fd)
    for scanner.Scan() {
        line := scanner.Text()
        offset, _ := fd.Seek(0, 1)
        fmt.Println(offset)
        fmt.Println(line)
        offsetreset, _ := fd.Seek(offset, 0)
        fmt.Println(offsetreset)
    }
    offset, err := fd.Seek(0, 1)
    CheckError(err)
    fmt.Println(offset)
    wg.Done()

}

I was expecting it to print offset in increasing order, however, it is printing the same value in each iteration until the file reaches EOF.

127.0.0.1 - - [11/Aug/2016:22:10:39 +0530] "GET /ttt HTTP/1.1" 404 437 "-" "curl/7.38.0"
613
613
127.0.0.1 - - [11/Aug/2016:22:10:42 +0530] "GET /qqq HTTP/1.1" 404 437 "-" "curl/7.38.0"
613

613 is the total number of characters in the file.

cat /var/log/apache2/access.log | wc
  7      84     613

Am I understanding it wrong, or does bufio.NewScanner reads the entire file in memory, and iterates over that in-memory? If so, is there a better way to read line-by-line?

  • 写回答

1条回答 默认 最新

  • dongmeirang4679 2016-08-11 17:25
    关注

    see func (s *Scanner) Buffer(buf []byte, max int) Docs:

    Buffer sets the initial buffer to use when scanning and the maximum size of buffer that may be allocated during scanning. The maximum token size is the larger of max and cap(buf).
    If max <= cap(buf), Scan will use this buffer only and do no allocation.

    By default, Scan uses an internal buffer and sets the maximum token size to MaxScanTokenSize.

    Buffer panics if it is called after scanning has started.

    And:

    MaxScanTokenSize is the maximum size used to buffer a token unless the user provides an explicit buffer with Scan.Buffer. The actual maximum token size may be smaller as the buffer may need to include, for instance, a newline.

    MaxScanTokenSize = 64 * 1024
    
    startBufSize = 4096 // Size of initial allocation for buffer.
    

    No, as @JimB said it reads only buffer size, see this test sample:

    For smaller than 4096 bytes it reads all file content to the buffer,
    but for big files just reads 4096 bytes,
    try this with big files:

    package main
    
    import (
        "bufio"
        "fmt"
        "os"
    )
    
    func main() {
        fd, err := os.Open("big.txt")
        if err != nil {
            panic(err)
        }
        defer fd.Close()
    
        n, err := fd.Seek(0, 0)
        if err != nil {
            panic(err)
        }
        fmt.Println("n =", n) // 0
    
        scanner := bufio.NewScanner(fd)
        for scanner.Scan() {
            fmt.Println(scanner.Text())
            break
        }
    
        offset, err := fd.Seek(0, 1)
        if err != nil {
            panic(err)
        }
        fmt.Println("offset =", offset) //4096
    
        offsetreset, err := fd.Seek(offset, 0)
        if err != nil {
            panic(err)
        }
        fmt.Println("offsetreset =", offsetreset) //4096
    
        offset, err = fd.Seek(0, 1)
        if err != nil {
            panic(err)
        }
        fmt.Println("offset =", offset) //4096
    
    }
    

    output:

    n = 0
    
    offset = 4096
    offsetreset = 4096
    offset = 4096
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 oracle集群安装出bug
  • ¥15 关于#python#的问题:自动化测试
  • ¥20 问题请教!vue项目关于Nginx配置nonce安全策略的问题
  • ¥15 教务系统账号被盗号如何追溯设备
  • ¥20 delta降尺度方法,未来数据怎么降尺度
  • ¥15 c# 使用NPOI快速将datatable数据导入excel中指定sheet,要求快速高效
  • ¥15 再不同版本的系统上,TCP传输速度不一致
  • ¥15 高德地图2.0 版本点聚合中Marker的位置无法实时更新,如何解决呢?
  • ¥15 DIFY API Endpoint 问题。
  • ¥20 sub地址DHCP问题