dpjw67160 2017-12-29 01:05
浏览 294
已采纳

如何使用filepath.Walk()仅查找文本文件?

I'm using filepath.Walk() to search through all the files in a directory. I'm implementing a search tool, so I'm only interested in opening files with text in them. I'm wondering if there's a way to ignore stuff like binary files that I wouldn't want to search through. I'm trying to minimize os calls, so it would be great if this could be done with just os.FileInfo.

  • 写回答

1条回答 默认 最新

  • dongsu4345 2017-12-29 15:45
    关注

    The only way to know if a file (or any byte stream) contains only "text" is to read the entire contents of the stream and determine if every rune is a "text" character according to your definition.

    For example, one might consider a file "ASCII text" if all runes have integer values in [0,128], are not control characters, or are whitespace:

    func isASCIITextStream(rd io.Reader) (bool, error) {
        reader := bufio.NewReader(rd)
        for {
            r, _, err := reader.ReadRune()
            if err == io.EOF {
                return true, nil // Every rune was text.
            }
            if err != nil {
                return false, err // Unexpected error.
            }
            if !isASCIIText(r) {
                return false, nil // At least one rune was not text.
            }
        }
        return true, fmt.Errorf("did not find EOF") // Unexpected state.
    }
    
    func isASCIIText(r rune) bool {
        x := int64(r)
        return (x >= 0) && (x <= 128) && (!unicode.IsControl(r) || unicode.IsSpace(r))
    }
    

    Of course, most people would consider many other Unicode character classes as containing "text", so whatever your approach is, the unicode package will likely be helpful for classifying runes.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 #MATLAB仿真#车辆换道路径规划
  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python
  • ¥15 要给毕业设计添加扫码登录的功能!!有偿
  • ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘