Am converting a text pattern scanner from Python3 to Go1.10, but am surprised it is actually 2 times slower. Upon profiling, the culprit is in strings.Contains()
. See the simple benchmarks below. Did I miss anything? Could you recommend a faster pattern search algorithm for Go that would perform better in this case? I'm not bothered about startup time, the same pattern will be used to scan millions of files.
Py3 benchmark:
import time
import re
RUNS = 10000
if __name__ == '__main__':
with open('data.php') as fh:
testString = fh.read()
def do():
return "576ad4f370014dfb1d0f17b0e6855f22" in testString
start = time.time()
for i in range(RUNS):
_ = do()
duration = time.time() - start
print("Python: %.2fs" % duration)
Go1.10 benchmark:
package main
import (
"fmt"
"io/ioutil"
"log"
"strings"
"time"
)
const (
runs = 10000
)
func main() {
fname := "data.php"
testdata := readFile(fname)
needle := "576ad4f370014dfb1d0f17b0e6855f22"
start := time.Now()
for i := 0; i < runs; i++ {
_ = strings.Contains(testdata, needle)
}
duration := time.Now().Sub(start)
fmt.Printf("Go: %.2fs
", duration.Seconds())
}
func readFile(fname string) string {
data, err := ioutil.ReadFile(fname)
if err != nil {
log.Fatal(err)
}
return string(data)
}
data.php
is a 528KB file that can be found here.
Output:
Go: 1.98s
Python: 0.84s