I'm trying to figure out the most fastest way of reading a large file line by line and checking if the line contains a string. The file I'm testing on is about 680mb large
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
func main() {
f, err := os.Open("./crackstation-human-only.txt")
scanner := bufio.NewScanner(f)
if err != nil {
panic(err)
}
defer f.Close()
for scanner.Scan() {
if strings.Contains(scanner.Text(), "Iforgotmypassword") {
fmt.Println(scanner.Text())
}
}
}
After building the program and timing it on my machine it runs over 3 seconds
./speed 3.13s user 1.25s system 122% cpu 3.563 total
After increasing the buffer
buf := make([]byte, 64*1024)
scanner.Buffer(buf, bufio.MaxScanTokenSize)
It gets a little better
./speed 2.47s user 0.25s system 104% cpu 2.609 total
I know it can get better because other tools mange to do it under a second without any kind of indexing. What seems to be the bottleneck with this approach?
0.33s user 0.14s system 94% cpu 0.501 total