I have the following Go program to process a TSV input. But it is slower than awk
and cut
. I know cut
uses string manipulate tricks to achieve a fast performance.
https://github.com/coreutils/coreutils/blob/master/src/cut.c
Is it possible to achieve the same performance as cut
with Go (or at least better than awk
)? What should things be used in Go to achieve a better performance?
$ ./main_.sh | indent.sh
time ./main.go 10 < "$infile" > /dev/null
real 0m1.431s
user 0m0.978s
sys 0m0.436s
time cut -f 10 < "$infile" > /dev/null
real 0m0.252s
user 0m0.225s
sys 0m0.025s
time awk -v FS='\t' -v OFS='\t' -e '{ print $10 }' < "$infile" > /dev/null
real 0m1.134s
user 0m1.108s
sys 0m0.024s
$ cat.sh main_.sh
#!/usr/bin/env bash
# vim: set noexpandtab tabstop=2:
infile=$(mktemp)
seq 10000000 | paste -s -d $'\t\t\t\t\t\t\t\t\t
' > "$infile"
set -v
time ./main.go 10 < "$infile" > /dev/null
time cut -f 10 < "$infile" > /dev/null
time awk -v FS='\t' -v OFS='\t' -e '{ print $10 }' < "$infile" > /dev/null
$ cat main.go
#!/usr/bin/env gorun
// vim: set noexpandtab tabstop=2:
package main
import (
"bufio"
"fmt"
"os"
"strings"
"strconv"
)
func main() {
idx, _ := strconv.Atoi(os.Args[1])
col := idx - 1
scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
line := strings.TrimRight(scanner.Text(), "
")
fields := strings.Split(line, "\t")
fmt.Printf("%s
", fields[col])
}
}