The Go Programming Language Specification
Defer statements
A "defer" statement invokes a function whose execution is deferred to
the moment the surrounding function returns, either because the
surrounding function executed a return statement, reached the end of
its function body, or because the corresponding goroutine is
panicking.
go/src/runtime/stack.go
:
func adjustdefers(gp *g, adjinfo *adjustinfo) {
// Adjust defer argument blocks the same way we adjust active stack frames.
tracebackdefers(gp, adjustframe, noescape(unsafe.Pointer(adjinfo)))
// Adjust pointers in the Defer structs.
// Defer structs themselves are never on the stack.
for d := gp._defer; d != nil; d = d.link {
adjustpointer(adjinfo, unsafe.Pointer(&d.fn))
adjustpointer(adjinfo, unsafe.Pointer(&d.sp))
adjustpointer(adjinfo, unsafe.Pointer(&d._panic))
}
}
go/src/runtime/stack.go
:
// Copies gp's stack to a new stack of a different size.
// Caller must have changed gp status to Gcopystack.
//
// If sync is true, this is a self-triggered stack growth and, in
// particular, no other G may be writing to gp's stack (e.g., via a
// channel operation). If sync is false, copystack protects against
// concurrent channel operations.
func copystack(gp *g, newsize uintptr, sync bool) {
// . . .
// allocate new stack
new := stackalloc(uint32(newsize))
if stackPoisonCopy != 0 {
fillstack(new, 0xfd)
}
// . . .
// Compute adjustment.
var adjinfo adjustinfo
adjinfo.old = old
adjinfo.delta = new.hi - old.hi
// . . .
// Adjust remaining structures that have pointers into stacks.
// We have to do most of these before we traceback the new
// stack because gentraceback uses them.
adjustctxt(gp, &adjinfo)
adjustdefers(gp, &adjinfo)
adjustpanics(gp, &adjinfo)
if adjinfo.sghi != 0 {
adjinfo.sghi += adjinfo.delta
}
// . . .
}
From my reading of the code, when a goroutine stack is resized adjustdefers
makes pointer adjustments for deferred functions.
You say that you are "running a Go program that spent most of the time doing GC." The second highest package is github.com/pelletier/go-buffruneio
. The code looks inefficient. Here's a simple benchmark for reading runes.
package main
import (
"bufio"
"bytes"
"io"
"testing"
"github.com/pelletier/go-buffruneio"
)
var buf = make([]byte, 64*1024)
func BenchmarkBuffruneio(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
r := buffruneio.NewReader(bytes.NewBuffer(buf[:cap(buf)]))
for {
rune, _, err := r.ReadRune()
if err == io.EOF || rune == buffruneio.EOF {
break
}
}
}
}
func BenchmarkBufio(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
r := bufio.NewReader(bytes.NewBuffer(buf[:cap(buf)]))
for {
_, _, err := r.ReadRune()
if err == io.EOF {
break
}
}
}
}
Output:
$ go test -v -bench=.
goos: linux
goarch: amd64
pkg: so/runes
BenchmarkBuffruneio-2 200 9395482 ns/op 4198721 B/op 131078 allocs/op
BenchmarkBufio-2 3000 333731 ns/op 4208 B/op 2 allocs/op
PASS
ok so/runes 3.878s
$