牛牛 2019-09-03 17:12 采纳率: 0%
浏览 2183

帮忙解决一个docker守护进程自己莫名死掉的问题,containerd异常

CentOS7, 多台虚拟机构建的Docker集群,所有服务器配置相同,其中一台总是莫名的Docker守护进程死掉,查看message日志如下:

Sep  3 12:30:03 sup-svc-70 systemd: Unit containerd.service entered failed state.
Sep  3 12:30:03 sup-svc-70 containerd: github.com/containerd/containerd/metrics/cgroups.(*oomCollector).start(0xc0003146c0)
Sep  3 12:30:03 sup-svc-70 containerd: /go/src/github.com/containerd/containerd/metrics/cgroups/oom.go:114 +0x7d
Sep  3 12:30:03 sup-svc-70 containerd: created by github.com/containerd/containerd/metrics/cgroups.newOOMCollector
Sep  3 12:30:03 sup-svc-70 containerd: /go/src/github.com/containerd/containerd/metrics/cgroups/oom.go:50 +0xed
Sep  3 12:30:03 sup-svc-70 containerd: goroutine 15 [IO wait, 1615 minutes]:
Sep  3 12:30:03 sup-svc-70 containerd: internal/poll.runtime_pollWait(0x7f906759ff00, 0x72, 0x0)
Sep  3 12:30:03 sup-svc-70 containerd: /.GOROOT/src/runtime/netpoll.go:173 +0x68
Sep  3 12:30:03 sup-svc-70 containerd: internal/poll.(*pollDesc).wait(0xc00036af98, 0x72, 0xc0001b6800, 0x0, 0x0)
Sep  3 12:30:03 sup-svc-70 containerd: /.GOROOT/src/internal/poll/fd_poll_runtime.go:85 +0x9c
Sep  3 12:30:03 sup-svc-70 containerd: internal/poll.(*pollDesc).waitRead(0xc00036af98, 0xffffffffffffff00, 0x0, 0x0)
Sep  3 12:30:03 sup-svc-70 containerd: /.GOROOT/src/internal/poll/fd_poll_runtime.go:90 +0x3f
Sep  3 12:30:03 sup-svc-70 containerd: internal/poll.(*FD).Accept(0xc00036af80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
Sep  3 12:30:03 sup-svc-70 containerd: /.GOROOT/src/internal/poll/fd_unix.go:384 +0x1a2
Sep  3 12:30:03 sup-svc-70 containerd: net.(*netFD).accept(0xc00036af80, 0xc000326058, 0x0, 0x0)
Sep  3 12:30:03 sup-svc-70 containerd: /.GOROOT/src/net/fd_unix.go:238 +0x44
Sep  3 12:30:03 sup-svc-70 containerd: net.(*UnixListener).accept(0xc0000df890, 0xc0003d9dc8, 0xc0003d9dd0, 0x18)
Sep  3 12:30:03 sup-svc-70 containerd: /.GOROOT/src/net/unixsock_posix.go:162 +0x34
Sep  3 12:30:03 sup-svc-70 containerd: net.(*UnixListener).Accept(0xc0000df890, 0x55d3f39bdd58, 0xc0000c2000, 0x55d3f39eb6c0, 0xc000326058)
Sep  3 12:30:03 sup-svc-70 containerd: /.GOROOT/src/net/unixsock.go:257 +0x49
Sep  3 12:30:03 sup-svc-70 containerd: github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).Serve(0xc0000c2000, 0x55d3f39e1960, 0xc0000df890, 0x0, 0x0)
Sep  3 12:30:03 sup-svc-70 containerd: /go/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.go:544 +0x212
Sep  3 12:30:03 sup-svc-70 containerd: github.com/containerd/containerd/services/server.(*Server).ServeGRPC(0xc00026e7b0, 0x55d3f39e1960, 0xc0000df890, 0x18, 0xc0004b2738)
Sep  3 12:30:03 sup-svc-70 containerd: /go/src/github.com/containerd/containerd/services/server/server.go:167 +0x6b
Sep  3 12:30:03 sup-svc-70 containerd: github.com/containerd/containerd/services/server.(*Server).ServeGRPC-fm(0x55d3f39e1960, 0xc0000df890,0xc0000df890, 0x0)
Sep  3 12:30:03 sup-svc-70 containerd: /go/src/github.com/containerd/containerd/cmd/containerd/command/main.go:171 +0x40
Sep  3 12:30:03 sup-svc-70 containerd: github.com/containerd/containerd/cmd/containerd/command.serve.func1(0x55d3f39e1960, 0xc0000df890, 0xc000249f70, 0x55d3f39e2f20, 0xc00003c018, 0xc0001e9640, 0x1f)
Sep  3 12:30:03 sup-svc-70 dockerd: time="2019-09-03T12:30:03.279896469+08:00" level=error msg="failed to get event" error="rpc error: code 
= Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialin
g dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
Sep  3 12:30:03 sup-svc-70 dockerd: time="2019-09-03T12:30:03.279979412+08:00" level=error msg="failed to get event" error="rpc error: code
= Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialin
g dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby

请问下,这可能是什么问题造成的,如何解决,或者该朝哪个方向去调查?

  • 写回答

1条回答 默认 最新

  • weixin_43975295 2019-09-04 10:03
    关注

    看看这台有问题的服务器和其他正常的服务器时间是同步的么?尝试下都重新同步下时间试试。

    评论

报告相同问题?

悬赏问题

  • ¥50 comsol稳态求解器 找不到解,奇异矩阵有1个空方程返回的解不收敛。没有返回所有参数步长;pid控制
  • ¥15 怎么让wx群机器人发送音乐
  • ¥15 fesafe材料库问题
  • ¥35 beats蓝牙耳机怎么查看日志
  • ¥15 Fluent齿轮搅油
  • ¥15 八爪鱼爬数据为什么自己停了
  • ¥15 交替优化波束形成和ris反射角使保密速率最大化
  • ¥15 树莓派与pix飞控通信
  • ¥15 自动转发微信群信息到另外一个微信群
  • ¥15 outlook无法配置成功