douguachan2879 2017-01-01 21:07
浏览 124
已采纳

golang os * File.Readdir在所有文件上使用lstat。 可以优化吗?

I am writing a program that finds all sub-directories from a parent directory which contains a huge number of files using os.File.Readdir, but running an strace to see the count of systemcalls showed that the go version is using an lstat() on all the files/directories present in the parent directory. (I am testing this with /usr/bin directory for now)

Go code:

package main
import (
        "fmt"
    "os"
)
func main() {
    x, err := os.Open("/usr/bin")
    if err != nil {
        panic(err)
    }
    y, err := x.Readdir(0)
    if err != nil {
        panic(err)
    }
    for _, i := range y {
    fmt.Println(i)
    }

}

Strace on the program (without following threads):

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 93.62    0.004110           2      2466           write
  3.46    0.000152           7        22           getdents64
  2.92    0.000128           0      2466           lstat // this increases with increase in no. of files.
  0.00    0.000000           0        11           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0       114           rt_sigaction
  0.00    0.000000           0         8           rt_sigprocmask
  0.00    0.000000           0         1           sched_yield
  0.00    0.000000           0         3           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           sigaltstack
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           gettid
  0.00    0.000000           0        57           futex
  0.00    0.000000           0         1           sched_getaffinity
  0.00    0.000000           0         1           openat
------ ----------- ----------- --------- --------- ----------------
100.00    0.004390                  5156           total

I tested the same with the C's readdir() without seeing this behaviour.

C code:

#include <stdio.h>
#include <dirent.h>

int main (void) {
    DIR* dir_p;
    struct dirent* dir_ent;

    dir_p = opendir ("/usr/bin");

    if (dir_p != NULL) {
        // The readdir() function returns a pointer to a dirent structure representing the next
        // directory entry in the directory stream pointed to by dirp.
        // It returns NULL on reaching the end of the directory stream or if an error occurred.
        while ((dir_ent = readdir (dir_p)) != NULL) {
            // printf("%s", dir_ent->d_name);
            // printf("%d", dir_ent->d_type);
            if (dir_ent->d_type == DT_DIR) {
                printf("%s is a directory", dir_ent->d_name);
            } else {
                printf("%s is not a directory", dir_ent->d_name);
            }

            printf("
");
        }
            (void) closedir(dir_p);

    }
    else
        perror ("Couldn't open the directory");

    return 0;
}

Strace on the program:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000128           0      2468           write
  0.00    0.000000           0         1           read
  0.00    0.000000           0         3           open
  0.00    0.000000           0         3           close
  0.00    0.000000           0         4           fstat
  0.00    0.000000           0         8           mmap
  0.00    0.000000           0         3           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0         3         3 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         4           getdents
  0.00    0.000000           0         1           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000128                  2503         3 total

I am aware that the only fields in the dirent structure that are mandated by POSIX.1 are d_name and d_ino, but I am writing this for a specific filesystem.

Tried *File.Readdirnames(), which doesn't use an lstat and gives a list of all files and directories, but to see if the returned string is a file or a directory will eventually do an lstat again.

  • I was wondering if it is possible to re-write the go program in a way to avoid the lstat() on all the files un-necessarily. I could see the C program is using the following syscalls. open("/usr/bin", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFDIR|0755, st_size=69632, ...}) = 0 brk(NULL) = 0x1098000 brk(0x10c1000) = 0x10c1000 getdents(3, /* 986 entries */, 32768) = 32752
  • Is this something like a premature optimisation, which I shouldn't be worried about? I raised this question because the number of files in the directory being monitored will be having huge number of small archived files, and the difference in systemcalls is almost twice between C and GO version, which will be hitting the disk.
  • 写回答

1条回答 默认 最新

  • dpdhnd3577 2017-01-02 12:20
    关注

    The package dirent looks like it accomplishes what you are looking for. Below is your C example written in Go:

    package main
    
    import (
        "bytes"
        "fmt"
        "io"
    
        "github.com/EricLagergren/go-gnulib/dirent"
        "golang.org/x/sys/unix"
    )
    
    func int8ToString(s []int8) string {
        var buff bytes.Buffer
        for _, chr := range s {
            if chr == 0x00 {
                break
            }
            buff.WriteByte(byte(chr))
        }
        return buff.String()
    }
    
    func main() {
        stream, err := dirent.Open("/usr/bin")
        if err != nil {
            panic(err)
        }
        defer stream.Close()
        for {
            entry, err := stream.Read()
            if err != nil {
                if err == io.EOF {
                    break
                }
                panic(err)
            }
    
            name := int8ToString(entry.Name[:])
            if entry.Type == unix.DT_DIR {
                fmt.Printf("%s is a directory
    ", name)
            } else {
                fmt.Printf("%s is not a directory
    ", name)
            }
        }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 PADS Logic 原理图
  • ¥15 PADS Logic 图标
  • ¥15 电脑和power bi环境都是英文如何将日期层次结构转换成英文
  • ¥20 气象站点数据求取中~
  • ¥15 如何获取APP内弹出的网址链接
  • ¥15 wifi 图标不见了 不知道怎么办 上不了网 变成小地球了
  • ¥50 STM32单片机传感器读取错误
  • ¥15 (关键词-阻抗匹配,HFSS,RFID标签天线)
  • ¥15 机器人轨迹规划相关问题
  • ¥15 word样式右侧翻页键消失