doude5860 2018-01-16 19:12
浏览 86
已采纳

如何制作与`cut`一样高效的TSV处理golang程序?

I have the following Go program to process a TSV input. But it is slower than awk and cut. I know cut uses string manipulate tricks to achieve a fast performance.

https://github.com/coreutils/coreutils/blob/master/src/cut.c

Is it possible to achieve the same performance as cut with Go (or at least better than awk)? What should things be used in Go to achieve a better performance?

$ ./main_.sh | indent.sh 
time ./main.go 10 < "$infile" > /dev/null

real    0m1.431s
user    0m0.978s
sys 0m0.436s
time cut -f 10 < "$infile" > /dev/null

real    0m0.252s
user    0m0.225s
sys 0m0.025s
time awk -v FS='\t' -v OFS='\t' -e '{ print $10 }' < "$infile" > /dev/null

real    0m1.134s
user    0m1.108s
sys 0m0.024s

$ cat.sh main_.sh 
#!/usr/bin/env bash
# vim: set noexpandtab tabstop=2:

infile=$(mktemp)
seq 10000000 | paste -s -d $'\t\t\t\t\t\t\t\t\t
' > "$infile"
set -v
time ./main.go 10 < "$infile" > /dev/null
time cut -f 10 < "$infile" > /dev/null
time awk -v FS='\t' -v OFS='\t' -e '{ print $10 }' < "$infile" > /dev/null

$ cat main.go
#!/usr/bin/env gorun
// vim: set noexpandtab tabstop=2:
package main

import (
    "bufio"
    "fmt"
    "os"
    "strings"
    "strconv"
)

func main() {
    idx, _ := strconv.Atoi(os.Args[1])
    col := idx - 1

    scanner := bufio.NewScanner(os.Stdin)
    for scanner.Scan() {
        line := strings.TrimRight(scanner.Text(), "
")
        fields := strings.Split(line, "\t")
        fmt.Printf("%s
", fields[col])
    }
}
  • 写回答

1条回答 默认 最新

  • drbz99867 2018-01-16 22:54
    关注

    If you profile the application, it will show most of the time is spent in

    fmt.Printf("%s
    ", fields[col])
    

    The main issue there is really the 10000000 syscalls you're making to write to stdout, so making stdout buffered will significantly reduce the execution time. Removing the overhead of the fmt calls will help even further.

    The next step would be to reduce allocations, which you can do by using byte slices rather than strings. Combining these would lead to something like

    stdout := bufio.NewWriter(os.Stdout)
    defer stdout.Flush()
    
    scanner := bufio.NewScanner(os.Stdin)
    for scanner.Scan() {
        line := scanner.Bytes()
        fields := bytes.Split(line, []byte{'\t'})
        stdout.Write(fields[col])
        stdout.Write([]byte{'
    '})
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法