weixin_40007515
weixin_40007515
2021-01-06 05:17

kubeadm join control-plane node times out (etcd timeout)

What keywords did you search in kubeadm issues before filing this one?

etcd join timeout kubeadm join timeout

Is this a BUG REPORT or FEATURE REQUEST?

BUG REPORT

Versions

kubeadm version (use kubeadm version): kubeadm version: &version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:20:51Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Environment: - Kubernetes version (use kubectl version): v1.15.2 - Cloud provider or hardware configuration: Openstack - OS (e.g. from /etc/os-release): Container Linux by CoreOS 2135.5.0 (Rhyolite) - Kernel (e.g. uname -a): Linux os1pi019-kube-master01 4.19.50-coreos-r1 #1 SMP Mon Jul 1 19:07:03 -00 2019 x86_64 Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz GenuineIntel GNU/Linux

  • Others:

What happened?

kubeadm join was invoced and failed. The etcd container did start up 7 seconds after kubeadm timed out / did exit with failure. See the following logs (this include kubeadm logs and timestamps for pod-manifest starts):


09:30:27 kubeadm service starts
09:30:27 kubeadm[2025]: [preflight] Reading configuration from the cluster...                                                                                                                                                                                                                                                                                   
09:30:27 kubeadm[2025]: [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'                                                                                                                                                                                                                            
09:30:27 kubeadm[2025]: [control-plane] Using manifest folder "/etc/kubernetes/manifests"                                                                                                                                                                                                                                                                       
09:30:27 kubeadm[2025]: [control-plane] Creating static Pod manifest for "kube-apiserver"                                                                                                                                                                                                                                                                       
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-policy" to "kube-apiserver"                                                                                                                                                                                                                                                          
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "policy-controller" to "kube-apiserver"                                                                                                                                                                                                                                                     
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-log" to "kube-apiserver"                                                                                                                                                                                                                                                             
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "scheduler-policy" to "kube-scheduler"                                                                                                                                                                                                                                                      
09:30:27 kubeadm[2025]: [control-plane] Creating static Pod manifest for "kube-controller-manager"                                                                                                                                                                                                                                                              
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-policy" to "kube-apiserver"                                                                                                                                                                                                                                                          
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "policy-controller" to "kube-apiserver"                                                                                                                                                                                                                                                     
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-log" to "kube-apiserver"                                                                                                                                                                                                                                                             
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "scheduler-policy" to "kube-scheduler"                                                                                                                                                                                                                                                      
09:30:27 kubeadm[2025]: [control-plane] Creating static Pod manifest for "kube-scheduler"                                                                                                                                                                                                                                                                       
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-policy" to "kube-apiserver"                                                                                                                                                                                                                                                          
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "policy-controller" to "kube-apiserver"                                                                                                                                                                                                                                                     
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "audit-log" to "kube-apiserver"                                                                                                                                                                                                                                                             
09:30:27 kubeadm[2025]: [controlplane] Adding extra host path mount "scheduler-policy" to "kube-scheduler"                                                                                                                                                                                                                                                      
09:30:27 kubeadm[2025]: [check-etcd] Checking that the etcd cluster is healthy                                                                                                                                                                                                                                                                                  
09:30:27 kubeadm[2025]: [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.15" ConfigMap in the kube-system namespace                                                                                                                                                                                                         
09:30:27 kubeadm[2025]: [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"                                                                                                                                                                                                                                                    
09:30:27 kubeadm[2025]: [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"                                                                                                                                                                                                                                
09:30:27 kubeadm[2025]: [kubelet-start] Activating the kubelet service                                                                                                                                                                                                                                                                                          
09:30:27 kubeadm[2025]: [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...                                                                                                                                                                                                                                                                 
09:30:29 kubeadm[2025]: [etcd] Announced new etcd member joining to the existing etcd cluster                                                                                                                                                                                                                                                                   
09:30:29 kubeadm[2025]: [etcd] Wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"                                                                                                                                                                                                                                       
09:30:29 kubeadm[2025]: [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s                                                                                                                                                                                                                                                     
09:30:38 etcd pause shim create
09:30:30 kube-scheduler pause shim create
09:30:30 kube-controller-manager pause shim create
09:30:34 kube-scheduler shim create
09:30:35 kube-scheduler first logs
09:30:36 kube-apiserver pause shim create
09:31:07 kubeadm[2025]: [kubelet-check] Initial timeout of 40s passed.                                                                                                                                                                                                                                                                                          
09:31:25 kube-controller-manager shim create
09:31:25 kube-controller-manager first logs
09:31:43 kube-apiserver shim create
09:31:44 kubeadm[2025]: error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available                                                                                                                                                                                     
09:31:44 systemd[1]: kubeadm.service: Main process exited, code=exited, status=1/FAILURE                                                                                                                                                                                                                                                                        
09:31:44 kube-apiserver first logs
09:31:51 etcd shim create
09:31:52.081609 etcd first logs

The timeout we hit here is this one which uses hardcoded values (8 times 5 seconds -> 40s)

What you expected to happen?

The etcd member get's joined to the existing control-plane node and kubeadm succeeds.

How to reproduce it (as minimally and precisely as possible)?

Hard to say. Try lots of kubeadm joins of control-plane nodes

Anything else we need to know?

In kubeadm init there is a similar looking parameter called TimeoutForControlPlane which defaults to 4 Minutes and is used here to wait for the API server.

This is similar to me because the problem described here and the code at the kubeadm init phase waits for a specific pod, started by the kubelet via a pod manifest.

I see three options: * increase the hardcoded values * use the same parameter as already used during init (TimeoutForControlPlane) which would result in no change to the kubeadm specs * add an additional parameter to the kubeadm spec

该提问来源于开源项目:kubernetes/kubeadm

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

27条回答

  • weixin_39624864 weixin_39624864 4月前

    could it be that the retry of ~12 seconds between joining the second and third etcd member is not enough in your case?

    点赞 评论 复制链接分享
  • weixin_39622084 weixin_39622084 4月前

    could it be that the retry of ~12 seconds between joining the second and third etcd member is not enough in your case?

    There is currently no third etcd member in my setup. The issue already occurs when I try adding a second master node (with stacked control plane nodes) to an existing single-master cluster.

    Where do the 12 seconds that you cite come from? Is this a configurable timeout that I could increase?

    点赞 评论 复制链接分享
  • weixin_39624864 weixin_39624864 4月前

    you can try building kubeadm from source:

    
    cd kubernetes
    git checkout v1.15.2
    
    <apply patch>
    
    make all WHAT=cmd/kubeadm
    </apply>

    the timeout is here: https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/util/etcd/etcd.go#L41-L48

    but i don't think this will solve the problem. seems to me something else is at play.

    do you have the option to try 1.16.2?

    点赞 评论 复制链接分享
  • weixin_39622084 weixin_39622084 4月前

    Thx for the exact pointer into source code.

    Yes, the option exists. I anticipate upgrading the cluster to v1.16.2 and then adding a third master/etcd. (Other tasks first on my list, though.)

    点赞 评论 复制链接分享
  • weixin_39565390 weixin_39565390 4月前

    I belive that I have found the issue to this.

    When observing /etc/kubernetes/manifests/etcd.yaml on the backup master that is trying to join, you will see that it advertises on a different IP range than the primary master.

    To avoid this, you must specify the advertise address manually when joining:

    
    kubeadm join .. --control-plane --apiserver-advertise-address <ip>
    </ip>

    Where <ip> is an address in the same subnet as the control plane.

    点赞 评论 复制链接分享
  • weixin_39710249 weixin_39710249 4月前

    I belive that I have found the issue to this.

    When observing /etc/kubernetes/manifests/etcd.yaml on the backup master that is trying to join, you will see that it advertises on a different IP range than the primary master.

    To avoid this, you must specify the advertise address manually when joining:

    
    kubeadm join .. --control-plane --apiserver-advertise-address <ip>
    </ip>

    Where <ip> is an address in the same subnet as the control plane.

    It works! You save my life! tks so much.

    点赞 评论 复制链接分享
  • weixin_39949776 weixin_39949776 4月前

    Several questions - Do you have the docker logs detailing what happened?
    - Did the etcd container start? - What version of docker are you running? - Does this happen consistently or intermittently?

    点赞 评论 复制链接分享
  • weixin_40007515 weixin_40007515 4月前

    Several questions

    • Do you have the docker logs detailing what happened?

    Not anymore but we are running some builds every night and I will catch the logs on the next occurrence.

    • Did the etcd container start?

    Yes the etcd container was running from docker perspective. The kubernetes cluster is already deleted, so I don't know its exact state. I will also try to get more information on the next occurrence.

    • What version of docker are you running?

    We've got the coreos built-in docker version which is 18.06.3

    • Does this happen consistently or intermittently? I currently cannot reproduce it reliably.

    I hope to have it occure again to get all the details and more information.

    点赞 评论 复制链接分享
  • weixin_39624864 weixin_39624864 4月前

    are you joining concurrently btw?

    点赞 评论 复制链接分享
  • weixin_39734458 weixin_39734458 4月前

    /assign

    点赞 评论 复制链接分享
  • weixin_40007515 weixin_40007515 4月前

    are you joining concurrently btw?

    No only one controle-plane node or worker node at the same time / sequentially

    点赞 评论 复制链接分享
  • weixin_39624864 weixin_39624864 4月前

    the same report here: https://github.com/kubernetes/website/issues/15637

    i changed the priority and we possibly need to increase the timeout and backport to 1.15.

    点赞 评论 复制链接分享
  • weixin_40007515 weixin_40007515 4月前

    the same report here: kubernetes/website#15637

    i changed the priority and we possibly need to increase the timeout and backport to 1.15.

    Let me know if I can help on this :-)

    点赞 评论 复制链接分享
  • weixin_39884270 weixin_39884270 4月前

    Any ETA for this? I am currently blocked with my multi master setup.

    点赞 评论 复制链接分享
  • weixin_39624864 weixin_39624864 4月前

    are you reproducing this consistently? i tried today running inside VMs and i couldn't.

    also our CI is consistently green and we are not seeing the same timeouts. (consistently, minus some other aspects)

    09:31:44 kubeadm[2025]: error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available

    40 seconds should be more than enough for the etcd cluster to report healthy endpoints. what are you seeing with kubeadm ... --v=2?

    点赞 评论 复制链接分享
  • weixin_39624864 weixin_39624864 4月前

    in terms of making this user controllable we have a field in v1beta2 and v1beat1 https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2

    called timeoutForControlPlane, but it's under the apiServer config. it feels to me that etcd timeouts needs a new field, except that we cannot add this field or backport it to v1beta1 and v1beta2, so it has to wait for v1beta3.

    the alternative is to just increase the hardcoded timeouts, but this ticket needs more evidence that it's a consistent bug.

    点赞 评论 复制链接分享
  • weixin_39842955 weixin_39842955 4月前

    Is currently on leave but he can provide some more details of our problems next Tuesday. Afaik we ended up with patching the hard-coded timeout because we couldn't get our nightly installs consistently green without patching it.

    点赞 评论 复制链接分享
  • weixin_39624864 weixin_39624864 4月前

    please do. my only explanation here would be slow hardware or networking and i would like to get the exact causes.

    点赞 评论 复制链接分享
  • weixin_40007515 weixin_40007515 4月前

    I've got some more data :-)

    Maybe the timeout does not need to get increased in our case. We had problems with our loadbalancers in getting active and routing traffic to the offline APIServer which caused the timeouts here.

    I will need to retest using our improved loadbalancer setup (activating backends after the kubeadm init went through) if we are still hitting this issue.

    点赞 评论 复制链接分享
  • weixin_39734458 weixin_39734458 4月前

    We had problems with our loadbalancers in getting active and routing traffic to the offline APIServer which caused the timeouts here.

    Thank you for the feedback.

    I will need to retest using our improved loadbalancer setup (activating backends after the kubeadm init went through) if we are still hitting this issue.

    +1

    点赞 评论 复制链接分享
  • weixin_39624864 weixin_39624864 4月前

    adding back "awaiting more evidence"

    点赞 评论 复制链接分享
  • weixin_40007515 weixin_40007515 4月前

    As of now I'm not able to reproduce the problem anymore in our Deployment pipelines using upstream v1.15.3 kubeadm and v1.16.0 kubeadm. I propose to close this one?

    点赞 评论 复制链接分享
  • weixin_39734458 weixin_39734458 4月前

    Let's close this issue and reopen if we see it bubbling up again. Thank you for your feedback .

    /close

    点赞 评论 复制链接分享
  • weixin_39878401 weixin_39878401 4月前

    : Closing this issue.

    In response to [this](https://github.com/kubernetes/kubeadm/issues/1712#issuecomment-534013037): >Let's close this issue and reopen if we see it bubbling up again. Thank you for your feedback . > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
    点赞 评论 复制链接分享
  • weixin_39622084 weixin_39622084 4月前

    I am facing the same issue when attempting to add a second master to a v1.15.2 cluster with kubeadm join .... An etcd snapshot taken on the first master is 19M in size. kubelet.service and the static pods (etcd, kube-apiserver, kube-controller-manager, kube-scheduler) came up on the second master, and an etcd snaptshot taken on the second master is also 19M in size. (So maybe the error indicates a non-event?)

    点赞 评论 复制链接分享
  • weixin_39624864 weixin_39624864 4月前

    hi, are you also getting:

    error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available

    ?

    点赞 评论 复制链接分享
  • weixin_39622084 weixin_39622084 4月前

    hi, are you also getting:

    error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available

    ?

    Yes, exactly.

    点赞 评论 复制链接分享

相关推荐