I'm using filepath.Walk()
to search through all the files in a directory. I'm implementing a search tool, so I'm only interested in opening files with text in them. I'm wondering if there's a way to ignore stuff like binary files that I wouldn't want to search through. I'm trying to minimize os calls, so it would be great if this could be done with just os.FileInfo
.
如何使用filepath.Walk()仅查找文本文件?
- 写回答
- 好问题 0 提建议
- 追加酬金
- 关注问题
- 邀请回答
-
1条回答 默认 最新
- dongsu4345 2017-12-29 15:45关注
The only way to know if a file (or any byte stream) contains only "text" is to read the entire contents of the stream and determine if every rune is a "text" character according to your definition.
For example, one might consider a file "ASCII text" if all runes have integer values in
[0,128]
, are not control characters, or are whitespace:func isASCIITextStream(rd io.Reader) (bool, error) { reader := bufio.NewReader(rd) for { r, _, err := reader.ReadRune() if err == io.EOF { return true, nil // Every rune was text. } if err != nil { return false, err // Unexpected error. } if !isASCIIText(r) { return false, nil // At least one rune was not text. } } return true, fmt.Errorf("did not find EOF") // Unexpected state. } func isASCIIText(r rune) bool { x := int64(r) return (x >= 0) && (x <= 128) && (!unicode.IsControl(r) || unicode.IsSpace(r)) }
Of course, most people would consider many other Unicode character classes as containing "text", so whatever your approach is, the
unicode
package will likely be helpful for classifying runes.本回答被题主选为最佳回答 , 对您是否有帮助呢?解决 无用评论 打赏 举报
悬赏问题
- ¥15 mmocr的训练错误,结果全为0
- ¥15 python的qt5界面
- ¥15 无线电能传输系统MATLAB仿真问题
- ¥50 如何用脚本实现输入法的热键设置
- ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
- ¥30 深度学习,前后端连接
- ¥15 孟德尔随机化结果不一致
- ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
- ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
- ¥15 谁有desed数据集呀