weixin_39724889 2020-11-19 12:02 采纳率: 0%
浏览 0

CI fails to start container frequently

Bug Report

example logs: https://internal.pingcap.net/idc-jenkins/blue/organizations/jenkins/operatorghpre2etestkind/detail/operatorghpre2etestkind/2484/pipeline/72

example failures:


[2020-01-27T14:16:18.322Z] docker run error: command "docker run --hostname tidb-operator-worker3 --name tidb-operator-worker3 --label io.x-k8s.kind.role=worker --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro --detach --tty --label io.x-k8s.kind.cluster=tidb-operator kindest/node:v1.12.10:68a6581f64b54994b824708286fafc37f1227b7b54cbb8865182ce1e036ed1cc" failed with error: exit status 125

[2020-01-27T14:16:19.690Z] ERROR: failed to create cluster: docker run error: command "docker run --hostname tidb-operator-worker3 --name tidb-operator-worker3 --label io.x-k8s.kind.role=worker --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro --detach --tty --label io.x-k8s.kind.cluster=tidb-operator kindest/node:v1.12.10:68a6581f64b54994b824708286fafc37f1227b7b54cbb8865182ce1e036ed1cc" failed with error: exit status 125

[2020-01-27T14:16:19.690Z] 

[2020-01-27T14:16:19.690Z] Output:

[2020-01-27T14:16:19.690Z] f57dba5be9f8b69d9f6fc887d7a12ba5032ea26fb452a8f544ea64adea31e1f0

[2020-01-27T14:16:19.691Z] docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused "mkdir /sys/fs/cgroup/memory/docker/f57dba5be9f8b69d9f6fc887d7a12ba5032ea26fb452a8f544ea64adea31e1f0: cannot allocate memory"": unknown.

[2020-01-27T14:16:19.691Z] 

[2020-01-27T14:16:19.691Z] Stack Trace: 

[2020-01-27T14:16:19.691Z] sigs.k8s.io/kind/pkg/errors.WithStack

[2020-01-27T14:16:19.691Z]  /src/pkg/errors/errors.go:51

[2020-01-27T14:16:19.691Z] sigs.k8s.io/kind/pkg/exec.(*LocalCmd).Run

[2020-01-27T14:16:19.691Z]  /src/pkg/exec/local.go:116

[2020-01-27T14:16:19.691Z] sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.createContainer

[2020-01-27T14:16:19.691Z]  /src/pkg/cluster/internal/providers/docker/provision.go:99

[2020-01-27T14:16:19.691Z] sigs.k8s.io/kind/pkg/cluster/internal/providers/docker.planCreation.func3

[2020-01-27T14:16:19.691Z]  /src/pkg/cluster/internal/providers/docker/provision.go:89

[2020-01-27T14:16:19.691Z] sigs.k8s.io/kind/pkg/errors.UntilErrorConcurrent.func1

[2020-01-27T14:16:19.691Z]  /src/pkg/errors/concurrent.go:30

[2020-01-27T14:16:19.691Z] runtime.goexit

[2020-01-27T14:16:19.691Z]  /usr/local/go/src/runtime/asm_amd64.s:1357

example logs from kubectl describe pods:


Warning  FailedCreatePodContainer  14m (x10 over 16m)  kubelet, 172.16.5.64  unable to ensure pod container exists: failed to create container for [kubepods podd1e2a265-410f-11ea-b31c-d0946604d177] : mkdir /sys/fs/cgroup/memory/kubepods/podd1e2a265-410f-11ea-b31c-d0946604d177: cannot allocate memory

related issue:

  • https://github.com/docker/for-linux/issues/841

possible solutions:

  • upgrade kernel to kernel-3.10.0-1062.4.1+ (https://github.com/docker/for-linux/issues/841#issuecomment-573518203)
  • 写回答

6条回答 默认 最新

  • weixin_39724889 2020-11-19 12:02
    关注

    logs of kubelet:

    
    ...
    Jan 29 14:14:16 tidb-operator-control-plane kubelet[2082]: I0129 14:14:16.196046    2082 cpu_manager.go:173] [cpumanager] starting with none policy
    Jan 29 14:14:16 tidb-operator-control-plane kubelet[2082]: I0129 14:14:16.196057    2082 cpu_manager.go:174] [cpumanager] reconciling every 10s
    Jan 29 14:14:16 tidb-operator-control-plane kubelet[2082]: I0129 14:14:16.196067    2082 policy_none.go:43] [cpumanager] none policy: Start
    Jan 29 14:14:16 tidb-operator-control-plane kubelet[2082]: E0129 14:14:16.197921    2082 node_container_manager_linux.go:50] Failed to create ["kubepods"] cgroup
    Jan 29 14:14:16 tidb-operator-control-plane kubelet[2082]: F0129 14:14:16.197938    2082 kubelet.go:1380] Failed to start ContainerManager mkdir /sys/fs/cgroup/memory/kubepods: cannot allocate memory
    Jan 29 14:14:16 tidb-operator-control-plane systemd[1]: kubelet.service: Main process exited, code=exited, status=255/EXCEPTION
    Jan 29 14:14:16 tidb-operator-control-plane systemd[1]: kubelet.service: Failed with result 'exit-code'.
    
    评论

报告相同问题?