douchong4730 2014-07-06 15:00
浏览 168
已采纳

为什么逐行读取文件时缓冲区大小不是总是4096的整数倍?

The sample code is,

// test.go
package main

import (
    "bufio"
    "os"
)

func main() {
    if len(os.Args) != 2 {
        println("Usage:", os.Args[0], "")
        os.Exit(1)
    }
    fileName := os.Args[1]
    fp, err := os.Open(fileName)
    if err != nil {
        println(err.Error())
        os.Exit(2)
    }
    defer fp.Close()
    r := bufio.NewScanner(fp)
    var lines []string
    for r.Scan() {
        lines = append(lines, r.Text())
    }
}

c:\>go build test.go

c:\>test.exe test.txt

Then I monitored its process using process monitor when executing it, part of the output is:

test.exe  ReadFile  SUCCESS      Offset: 4,692,375, Length: 8,056
test.exe  ReadFile  SUCCESS      Offset: 4,700,431, Length: 7,198
test.exe  ReadFile  SUCCESS      Offset: 4,707,629, Length: 8,134
test.exe  ReadFile  SUCCESS      Offset: 4,715,763, Length: 7,361
test.exe  ReadFile  SUCCESS      Offset: 4,723,124, Length: 8,056
test.exe  ReadFile  SUCCESS      Offset: 4,731,180, Length: 4,322
test.exe  ReadFile  END OF FILE  Offset: 4,735,502, Length: 8,192

The equivalent java code is,

//Test.java
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStreamReader;

public class Test{
public static void main(String[] args) {
  try
  {
  FileInputStream in = new FileInputStream("test.txt");
  BufferedReader br = new BufferedReader(new InputStreamReader(in));
  String strLine;
  while((strLine = br.readLine())!= null)
  {
   ;
  }
  }catch(Exception e){
   System.out.println(e);
  }
 }
}

c:\>javac Test.java

c:\>java Test

Then part of the monitoring output is:

java.exe  ReadFile  SUCCESS       Offset: 4,694,016, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,702,208, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,710,400, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,718,592, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,726,784, Length: 8,192
java.exe  ReadFile  SUCCESS       Offset: 4,734,976, Length: 526
java.exe  ReadFile  END OF FILE   Offset: 4,735,502, Length: 8,192

As you see, the buffer size in java is 8192 and it read 8192 bytes each time.Why is the Length in Go changing during each time reading file?

I have tried bufio.ReadString(' '),bufio.ReadBytes(' ')and both of them have the same problem.

[Update] I have tested the sample in C,

//test.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
        FILE * fp;
        char * line = NULL;
        size_t len = 0;
        ssize_t read;
        fp = fopen("test.txt", "r");
        if (fp == NULL)
                exit(EXIT_FAILURE);
        while ((read = getline(&line, &len, fp)) != -1) {
                printf("Retrieved line of length %zu :
", read);
        }
        if (line)
                free(line);
        return EXIT_SUCCESS;
}

The output is similar with java code(the buffer size is 65536 on my system).So why Go is so different here?

  • 写回答

2条回答 默认 最新

  • dongqiao8421 2014-07-06 15:45
    关注

    Reading bufio.Scan's source shows that while the buffer size is 4096, it reads depending on how much "empty" space is left in it, specifically this part:

    n, err := s.r.Read(s.buf[s.end:len(s.buf)])
    

    Now performance wise, I'm almost positive whatever file system you're using will be smart enough to read-ahead and cache the data, so the buffer size shouldn't make that much of a difference.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 DIFY API Endpoint 问题。
  • ¥20 sub地址DHCP问题
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办
  • ¥15 kylin启动报错log4j类冲突