duanboxue3422 2017-02-26 00:06
浏览 49

如何对修改输入的函数进行基准测试?

When I'm benchmarking a function that modifies its input, I have to copy the test data for each loop of the benchmark, and pause the timer while I'm doing so. This can mean that if I run go test -bench MyTest -benchtime 1s the test can take 2 full minutes rather than 1 second.

Am I doing something wrong or will I just have to live with this?


More context:

I'm writing a program for reading syslog logs. Part of my logging paradigm is that the first line of a logged message contains readable text, and following lines contain "extra information", like a stack trace. My log reader therefore (among other things) splits the message on the first line break, which is escaped to #012 by rsyslog.

Here is the code for that:

// Splits the main line from extra information
func splitMessageExtra(line *string) string {
    var prev rune

    for i, char := range *line {
        if prev == 0 && char == '#' {
            prev = char
            continue
        }

        if prev == '#' && char == '0' {
            prev = char
            continue
        }

        if prev == '0' && char == '1' {
            prev = char
            continue
        }

        if prev == '1' && char == '2' {
            extra := (*line)[i+1:]
            *line = (*line)[0 : i-3]

            return extra
        }

        prev = 0
    }

    return ""
}

It originally used strings.Split and returned new strings, but cpu profiling showed that it was way too slow.

Here is the benchmark function:

var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`

func BenchmarkSplitMessageExtra(b *testing.B) {
    for i := 0; i < b.N; i++ {
        b.StopTimer()
        msg := string([]byte(testMessage))
        b.StartTimer()

        splitMessageExtra(&msg)
    }
}

Here's a run without pausing the timer:

$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8     3000000           434 ns/op
PASS
ok      github.com/Hubro/logreader  1.730s

And here's a run with the exact benchmark function above:

$ go test -bench SplitMessageExtra -benchtime 1s
BenchmarkSplitMessageExtra-8     5000000           385 ns/op
PASS
ok      github.com/Hubro/logreader  100.563s

Notice it takes AGES to run.

  • 写回答

2条回答 默认 最新

  • duanrong3308 2017-02-26 09:20
    关注

    Your code and benchmark do seem slow. Here's a faster version.

    package main
    
    import (
        "strings"
        "testing"
    )
    
    // Splits the main line from extra information
    func splitMessageExtra(line *string) string {
        const newline = "#012"
        i := strings.Index(*line, newline)
        if i < 0 {
            return ""
        }
        extra := (*line)[i+len(newline):]
        *line = (*line)[0:i]
        return extra
    }
    
    var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`
    
    func BenchmarkSplitMessageExtra(b *testing.B) {
        for i := 0; i < b.N; i++ {
            msg := testMessage
            splitMessageExtra(&msg)
        }
    }
    

    Output:

    $ go test -bench=.
    goos: linux
    goarch: amd64
    pkg: extra
    BenchmarkSplitMessageExtra-4    50000000            32.2 ns/op
    PASS
    ok      extra   1.647s
    

    For comparison, here are the results from your code and benchmark. Your code and benchmark are slower than mine: 968 ns/op and 50.184s versus 32.2 ns/op and 1.647s respectively.

    package main
    
    import (
        "testing"
    )
    
    // Splits the main line from extra information
    func splitMessageExtra(line *string) string {
        var prev rune
        for i, char := range *line {
            if prev == 0 && char == '#' {
                prev = char
                continue
            }
            if prev == '#' && char == '0' {
                prev = char
                continue
            }
            if prev == '0' && char == '1' {
                prev = char
                continue
            }
            if prev == '1' && char == '2' {
                extra := (*line)[i+1:]
                *line = (*line)[0 : i-3]
    
                return extra
            }
            prev = 0
        }
        return ""
    }
    
    var testMessage = `Feb 10 15:16:20 foo_stats[-] (warning): [foo_stats.postfix, line 166, thread "processor_mta03"]: Skipped line because there is no context:#012Feb 10 15:16:20 mta03 postfix/qmgr[7419]: ABCDEF123: from=<>, size=24431, nrcpt=1 (queue active)`
    
    func BenchmarkSplitMessageExtra(b *testing.B) {
        for i := 0; i < b.N; i++ {
            b.StopTimer()
            msg := string([]byte(testMessage))
            b.StartTimer()
            splitMessageExtra(&msg)
        }
    }
    

    Output:

    $ go test -bench=.
    goos: linux
    goarch: amd64
    pkg: extra
    BenchmarkSplitMessageExtra-4     2000000           968 ns/op    
    PASS
    ok      extra   50.184s
    

    Some of your code is unnecessary; it uses CPU time and triggers allocations. For example, converting to utf-8 bytes to runes,for i, char := range *line {}, and converting string to []byte to string, string([]byte(testMessage)) . Some algorithms could be improved. For example, searching for a newline.

    评论

报告相同问题?

悬赏问题

  • ¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场 部分对应不上
  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?