dtkyayvldeaqhl7151
2019-04-01 18:26
浏览 261
已采纳

去SQL驱动程序+上下文取消似乎“泄漏” goroutines?

Background

When I run a load test on the application that I'm building everything is working smoothly, until the machine almost runs out of memory and the application runs into problems. Suddenly all the contexts get cancelled (most likely by the hystrix library) because it takes too long to handle requests. So far, nothing really surprising.

The problem

What I do find strange is that it seems to start leaking go routines when contexts get cancelled.

In healthy conditions there are around 20 goroutines running all the time. But after this situation happens it keeps more than 20 goroutines around: 98 to be precise. This number does not drop with time. I use http.Server with read/write timeouts, go-sql-driver/mysql and hystrix with context. Below a dump of goroutines that are running after way longer than all the timeouts. Almost all of them are from or are invoked by the go-sql-driver. There are no queries left in the MySQL processlist. Curious if someone else experienced this and know how to solve it.

The application is still runs smoothly after this and can handle over 2000r/s with 100% success rate, but after the test finishes the previous hanging goroutines are still there.

Here is a dump of goroutines (not all but it is more of the same). The runtime_pollWait is from the go-mysql-driver:

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7fd0fbc007b8, 0x72, 0x0)

goroutine 18 [sleep, 43 minutes]:
runtime.goparkunlock(...)

goroutine 19 [select, 2 minutes]:
database/sql.(*DB).connectionOpener(0xc000188240, 0xb835a0, 0xc00017b2c0)

goroutine 20 [select]:
database/sql.(*DB).connectionResetter(0xc000188240, 0xb835a0, 0xc00017b2c0)

goroutine 34 [select, 2 minutes]:
database/sql.(*DB).connectionOpener(0xc0001883c0, 0xb835a0, 0xc00016a140)

goroutine 35 [select]:
database/sql.(*DB).connectionResetter(0xc0001883c0, 0xb835a0, 0xc00016a140)

goroutine 8 [chan receive]:
github.com/go-redis/redis/internal/pool.(*ConnPool).reaper(0xc0000ba370, 0xdf8475800)

goroutine 11 [chan receive]:
github.com/go-redis/redis/internal/pool.(*ConnPool).reaper(0xc0000ba420, 0xdf8475800)

goroutine 14 [chan receive]:
github.com/go-redis/redis/internal/pool.(*ConnPool).reaper(0xc0000ba4d0, 0xdf8475800)

goroutine 98 [sleep, 43 minutes]:
runtime.goparkunlock(...)

goroutine 87 [chan receive]:
github.com/afex/hystrix-go/hystrix.(*metricExchange).Monitor(0xc002a1b5c0)

goroutine 99 [syscall]:
syscall.Syscall6(0xe8, 0xd, 0xc0002b9a58, 0x64, 0x64, 0x0, 0x0, 0x42bfcf, 0x8, 0xacdd80)

goroutine 100 [sleep]:
runtime.goparkunlock(...)

goroutine 88 [chan receive]:
github.com/afex/hystrix-go/hystrix.(*poolMetrics).Monitor(0xc003ac08a0)

goroutine 91 [chan receive]:
github.com/afex/hystrix-go/hystrix.(*metricExchange).Monitor(0xc002a1b680)

goroutine 92 [chan receive]:
github.com/afex/hystrix-go/hystrix.(*poolMetrics).Monitor(0xc003ac0f00)

goroutine 25921733 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06f298, 0x72, 0xffffffffffffffff)

goroutine 25921439 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0f983fe18, 0x72, 0xffffffffffffffff)

goroutine 25927120 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc003aa7620, 0xc01c2ef980, 0xc01a841bc0)

goroutine 25928175 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc012abbf80, 0xc01c2ee600, 0xc01144f8c0)

goroutine 25927727 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc014ae4300, 0xc0166a3c80, 0xc0030e8a80)

goroutine 24567262 [select]:
database/sql.(*DB).connectionCleaner(0xc000188240, 0x2540be400)

goroutine 25927990 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc012ababa0, 0xc0166a3ec0, 0xc01144f380)

goroutine 25921926 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa0702d8, 0x72, 0xffffffffffffffff)

goroutine 25921958 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06eb48, 0x72, 0xffffffffffffffff)

goroutine 24567381 [select]:
database/sql.(*DB).connectionCleaner(0xc0001883c0, 0x2540be400)

goroutine 25921798 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0f9882038, 0x72, 0xffffffffffffffff)

goroutine 25921751 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b6820, 0x72, 0xffffffffffffffff)

goroutine 25921757 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b6410, 0x72, 0xffffffffffffffff)

goroutine 25927113 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc003aa7320, 0xc01c2efbc0, 0xc01a8419e0)

goroutine 25921966 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06e8d8, 0x72, 0xffffffffffffffff)

goroutine 25921964 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06e9a8, 0x72, 0xffffffffffffffff)

goroutine 25921323 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06f0f8, 0x72, 0xffffffffffffffff)

goroutine 25928466 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552240, 0xc01c2ef740, 0xc01144f920)

goroutine 25928011 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc0156599e0, 0xc01c2ee3c0, 0xc01baa8fc0)

goroutine 25919812 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06ea78, 0x72, 0xffffffffffffffff)

goroutine 25928014 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc015659b00, 0xc01177e900, 0xc01baa90e0)

goroutine 25922001 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa070548, 0x72, 0xffffffffffffffff)

goroutine 25925675 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b8150, 0x72, 0xffffffffffffffff)

goroutine 25928529 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc003aa7680, 0xc01c2ef8c0, 0xc01a841c20)

goroutine 25928467 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552360, 0xc01c2ef680, 0xc01144f980)

goroutine 25928468 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552480, 0xc01c2ef5c0, 0xc01144f9e0)

goroutine 25919817 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06e808, 0x72, 0xffffffffffffffff)

goroutine 25925511 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0f983e4e8, 0x72, 0xffffffffffffffff)

goroutine 25920658 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa070208, 0x72, 0xffffffffffffffff)

goroutine 25920661 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa070138, 0x72, 0xffffffffffffffff)

goroutine 25920667 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06e738, 0x72, 0xffffffffffffffff)

goroutine 25920860 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06f368, 0x72, 0xffffffffffffffff)

goroutine 25920790 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa070478, 0x72, 0xffffffffffffffff)

goroutine 25920793 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06fec8, 0x72, 0xffffffffffffffff)

goroutine 25928006 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc015659800, 0xc01c1603c0, 0xc01baa8de0)

goroutine 25921085 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b7fb0, 0x72, 0xffffffffffffffff)

.....

goroutine 25925779 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b7ba0, 0x72, 0xffffffffffffffff)

goroutine 25926131 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b69c0, 0x72, 0xffffffffffffffff)

goroutine 25927988 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc012abaae0, 0xc01c2ee000, 0xc01144f2c0)

goroutine 25928470 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e5526c0, 0xc01c2ef440, 0xc01144faa0)

goroutine 25928471 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e5527e0, 0xc01c2ef380, 0xc01144fb00)

goroutine 25928472 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552900, 0xc01c2ef2c0, 0xc01144fb60)

goroutine 25928473 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552a20, 0xc01c2ef200, 0xc01144fbc0)

goroutine 26252345 [IO wait]:
internal/poll.runtime_pollWait(0x7fd0e7054fe8, 0x72, 0xffffffffffffffff)
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • douzong7283 2019-04-04 10:38
    已采纳

    To answer my own question; the problem is most likely the system being overloaded. I rate limited the number of concurrent connections and use a lot less MySQL connections 25%. That seems to solve the problem. I have less throughput now, but a more stable system.

    It still doesn't answer why the netpoller keeps blocking even while the system calmed down again. They just never recover, which I think is not good. But probably there is a good reason for it. I saw this symptoms multiple times on Stackoverflow with network connections, like for example http.Client.

    点赞 评论

相关推荐 更多相似问题