dtkyayvldeaqhl7151 2019-04-01 18:26
浏览 329
已采纳

去SQL驱动程序+上下文取消似乎“泄漏” goroutines?

Background

When I run a load test on the application that I'm building everything is working smoothly, until the machine almost runs out of memory and the application runs into problems. Suddenly all the contexts get cancelled (most likely by the hystrix library) because it takes too long to handle requests. So far, nothing really surprising.

The problem

What I do find strange is that it seems to start leaking go routines when contexts get cancelled.

In healthy conditions there are around 20 goroutines running all the time. But after this situation happens it keeps more than 20 goroutines around: 98 to be precise. This number does not drop with time. I use http.Server with read/write timeouts, go-sql-driver/mysql and hystrix with context. Below a dump of goroutines that are running after way longer than all the timeouts. Almost all of them are from or are invoked by the go-sql-driver. There are no queries left in the MySQL processlist. Curious if someone else experienced this and know how to solve it.

The application is still runs smoothly after this and can handle over 2000r/s with 100% success rate, but after the test finishes the previous hanging goroutines are still there.

Here is a dump of goroutines (not all but it is more of the same). The runtime_pollWait is from the go-mysql-driver:

goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7fd0fbc007b8, 0x72, 0x0)

goroutine 18 [sleep, 43 minutes]:
runtime.goparkunlock(...)

goroutine 19 [select, 2 minutes]:
database/sql.(*DB).connectionOpener(0xc000188240, 0xb835a0, 0xc00017b2c0)

goroutine 20 [select]:
database/sql.(*DB).connectionResetter(0xc000188240, 0xb835a0, 0xc00017b2c0)

goroutine 34 [select, 2 minutes]:
database/sql.(*DB).connectionOpener(0xc0001883c0, 0xb835a0, 0xc00016a140)

goroutine 35 [select]:
database/sql.(*DB).connectionResetter(0xc0001883c0, 0xb835a0, 0xc00016a140)

goroutine 8 [chan receive]:
github.com/go-redis/redis/internal/pool.(*ConnPool).reaper(0xc0000ba370, 0xdf8475800)

goroutine 11 [chan receive]:
github.com/go-redis/redis/internal/pool.(*ConnPool).reaper(0xc0000ba420, 0xdf8475800)

goroutine 14 [chan receive]:
github.com/go-redis/redis/internal/pool.(*ConnPool).reaper(0xc0000ba4d0, 0xdf8475800)

goroutine 98 [sleep, 43 minutes]:
runtime.goparkunlock(...)

goroutine 87 [chan receive]:
github.com/afex/hystrix-go/hystrix.(*metricExchange).Monitor(0xc002a1b5c0)

goroutine 99 [syscall]:
syscall.Syscall6(0xe8, 0xd, 0xc0002b9a58, 0x64, 0x64, 0x0, 0x0, 0x42bfcf, 0x8, 0xacdd80)

goroutine 100 [sleep]:
runtime.goparkunlock(...)

goroutine 88 [chan receive]:
github.com/afex/hystrix-go/hystrix.(*poolMetrics).Monitor(0xc003ac08a0)

goroutine 91 [chan receive]:
github.com/afex/hystrix-go/hystrix.(*metricExchange).Monitor(0xc002a1b680)

goroutine 92 [chan receive]:
github.com/afex/hystrix-go/hystrix.(*poolMetrics).Monitor(0xc003ac0f00)

goroutine 25921733 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06f298, 0x72, 0xffffffffffffffff)

goroutine 25921439 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0f983fe18, 0x72, 0xffffffffffffffff)

goroutine 25927120 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc003aa7620, 0xc01c2ef980, 0xc01a841bc0)

goroutine 25928175 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc012abbf80, 0xc01c2ee600, 0xc01144f8c0)

goroutine 25927727 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc014ae4300, 0xc0166a3c80, 0xc0030e8a80)

goroutine 24567262 [select]:
database/sql.(*DB).connectionCleaner(0xc000188240, 0x2540be400)

goroutine 25927990 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc012ababa0, 0xc0166a3ec0, 0xc01144f380)

goroutine 25921926 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa0702d8, 0x72, 0xffffffffffffffff)

goroutine 25921958 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06eb48, 0x72, 0xffffffffffffffff)

goroutine 24567381 [select]:
database/sql.(*DB).connectionCleaner(0xc0001883c0, 0x2540be400)

goroutine 25921798 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0f9882038, 0x72, 0xffffffffffffffff)

goroutine 25921751 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b6820, 0x72, 0xffffffffffffffff)

goroutine 25921757 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b6410, 0x72, 0xffffffffffffffff)

goroutine 25927113 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc003aa7320, 0xc01c2efbc0, 0xc01a8419e0)

goroutine 25921966 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06e8d8, 0x72, 0xffffffffffffffff)

goroutine 25921964 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06e9a8, 0x72, 0xffffffffffffffff)

goroutine 25921323 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06f0f8, 0x72, 0xffffffffffffffff)

goroutine 25928466 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552240, 0xc01c2ef740, 0xc01144f920)

goroutine 25928011 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc0156599e0, 0xc01c2ee3c0, 0xc01baa8fc0)

goroutine 25919812 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06ea78, 0x72, 0xffffffffffffffff)

goroutine 25928014 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc015659b00, 0xc01177e900, 0xc01baa90e0)

goroutine 25922001 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa070548, 0x72, 0xffffffffffffffff)

goroutine 25925675 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b8150, 0x72, 0xffffffffffffffff)

goroutine 25928529 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc003aa7680, 0xc01c2ef8c0, 0xc01a841c20)

goroutine 25928467 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552360, 0xc01c2ef680, 0xc01144f980)

goroutine 25928468 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552480, 0xc01c2ef5c0, 0xc01144f9e0)

goroutine 25919817 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06e808, 0x72, 0xffffffffffffffff)

goroutine 25925511 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0f983e4e8, 0x72, 0xffffffffffffffff)

goroutine 25920658 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa070208, 0x72, 0xffffffffffffffff)

goroutine 25920661 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa070138, 0x72, 0xffffffffffffffff)

goroutine 25920667 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06e738, 0x72, 0xffffffffffffffff)

goroutine 25920860 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06f368, 0x72, 0xffffffffffffffff)

goroutine 25920790 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa070478, 0x72, 0xffffffffffffffff)

goroutine 25920793 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa06fec8, 0x72, 0xffffffffffffffff)

goroutine 25928006 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc015659800, 0xc01c1603c0, 0xc01baa8de0)

goroutine 25921085 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b7fb0, 0x72, 0xffffffffffffffff)

.....

goroutine 25925779 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b7ba0, 0x72, 0xffffffffffffffff)

goroutine 25926131 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x7fd0fa6b69c0, 0x72, 0xffffffffffffffff)

goroutine 25927988 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc012abaae0, 0xc01c2ee000, 0xc01144f2c0)

goroutine 25928470 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e5526c0, 0xc01c2ef440, 0xc01144faa0)

goroutine 25928471 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e5527e0, 0xc01c2ef380, 0xc01144fb00)

goroutine 25928472 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552900, 0xc01c2ef2c0, 0xc01144fb60)

goroutine 25928473 [select, 3 minutes]:
github.com/go-sql-driver/mysql.(*mysqlConn).startWatcher.func1(0xc00e552a20, 0xc01c2ef200, 0xc01144fbc0)

goroutine 26252345 [IO wait]:
internal/poll.runtime_pollWait(0x7fd0e7054fe8, 0x72, 0xffffffffffffffff)
  • 写回答

1条回答 默认 最新

  • douzong7283 2019-04-04 10:38
    关注

    To answer my own question; the problem is most likely the system being overloaded. I rate limited the number of concurrent connections and use a lot less MySQL connections 25%. That seems to solve the problem. I have less throughput now, but a more stable system.

    It still doesn't answer why the netpoller keeps blocking even while the system calmed down again. They just never recover, which I think is not good. But probably there is a good reason for it. I saw this symptoms multiple times on Stackoverflow with network connections, like for example http.Client.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥120 计算机网络的新校区组网设计
  • ¥20 完全没有学习过GAN,看了CSDN的一篇文章,里面有代码但是完全不知道如何操作
  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 海浪数据 南海地区海况数据,波浪数据
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?