weixin_39562197
weixin_39562197
2021-01-07 10:10

'Insufficient vpc.amazonaws.com/PrivateIPv4Address' for windows workloads when using existing VPC/Subnets

What happened?

I have created a cluster using existing VPCs and with a windows node group, also running eksctl utils install-vpc-controllers accordingly to the documentation. After running the IIS test pod, the result I get is the following:

FailedScheduling ... Insufficient vpc.amazonaws.com/PrivateIPv4Address

full log

21:43 $ kubectl describe pod windows-server-iis-66bf9745b-sjsjq
Name:           windows-server-iis-66bf9745b-sjsjq
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=windows-server-iis
                pod-template-hash=66bf9745b
                tier=backend
                track=stable
Annotations:    kubernetes.io/psp: eks.privileged
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/windows-server-iis-66bf9745b
Containers:
  windows-server-iis:
    Image:      mcr.microsoft.com/windows/servercore:1809
    Port:       80/TCP
    Host Port:  0/TCP
    Command:
      powershell.exe
      -command
      Add-WindowsFeature Web-Server; Invoke-WebRequest -UseBasicParsing -Uri 'https://dotnetbinaries.blob.core.windows.net/servicemonitor/2.0.1.6/ServiceMonitor.exe' -OutFile 'C:\ServiceMonitor.exe'; echo '<br><br><marquee><h1>Hello EKS!!!<h1><marquee>' > C:\inetpub\wwwroot\default.html; C:\ServiceMonitor.exe 'w3svc';
    Limits:
      vpc.amazonaws.com/PrivateIPv4Address:  1
    Requests:
      vpc.amazonaws.com/PrivateIPv4Address:  1
    Environment:                             <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-fwkxs (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-fwkxs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-fwkxs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/os=windows
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  18s (x5 over 6m10s)  default-scheduler  0/3 nodes are available: 2 node(s) didn't match node selector, 3 Insufficient vpc.amazonaws.com/PrivateIPv4Address.
</none></marquee></h1></h1></marquee></none></none>

Checking the VPC controller logs, found the the following:

failed to find the route table for subnet subnet-12345678

full log

I1028 16:09:23.456376       1 watcher.go:178] Node watcher processing node ip-10-10-92-124.ec2.internal.
I1028 16:09:23.456411       1 manager.go:109] Node manager adding node ip-10-10-92-124.ec2.internal with instanceID i-0c1544862a42d8885.
E1028 16:09:23.686677       1 manager.go:126] Node manager failed to get resource vpc.amazonaws.com/CIDRBlock pool on node ip-10-10-92-124.ec2.internal: failed to find the route table for subnet subnet-12345678
E1028 16:09:23.686702       1 watcher.go:183] Node watcher failed to add node ip-10-10-92-124.ec2.internal: failed to find the route table for subnet subnet-12345678
E1028 16:09:23.686716       1 watcher.go:266] failed to find the route table for subnet subnet-12345678
W1028 16:09:23.686882       1 watcher.go:267] Node watcher dropping key "ip-10-10-92-124.ec2.internal" out of queue: failed to find the route table for subnet subnet-12345678

My guess is that eksctl is not able to find the route table (which is indeed there as I checked in the web console) because it does not create the VPCs and Subnets, or some permission it would otherwise create is not present.

What you expected to happen? IIS pod should be correctly scheduled.

How to reproduce it? 1. Create a cluster with the following config:


apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: visagio-internal-apps
  region: us-east-1

vpc:
  id: vpc-33f15f56
  subnets:
    public:
      us-east-1a:
        id: subnet-1111
      us-east-1b:
        id: subnet-2222
      us-east-1c:
        id: subnet-3333
      us-east-1d:
        id: subnet-4444

nodeGroups:
- name: linux-default-1
  instanceType: t3.large
  availabilityZones: ["us-east-1b", "us-east-1d"]
  desiredCapacity: 2
  minSize: 1
  maxSize: 4
  volumeSize: 50
  ssh:
    publicKeyName: bruno.medeiros-notebook
  iam:
    withAddonPolicies:
      ebs: true
      efs: true
- name: windows-default-1
  amiFamily: WindowsServer2019CoreContainer
  instanceType: t3.medium
  availabilityZones: ["us-east-1b"]
  desiredCapacity: 1
  minSize: 1
  maxSize: 2
  1. Install the VPC controller with eksctl utils install-vpc-controllers ...

  2. Try to schedule any Windows workload.

Anything else we need to know? Nothing special, everything setup based on docs, cluster created from scratch in 1.14 already.

Versions Please paste in the output of these commands:


$ eksctl version
$ kubectl version

22:00 $ eksctl version
[ℹ]  version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.7.0"}

11:46 $ kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T23:42:50Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.6-eks-5047ed", GitCommit:"5047edce664593832e9b889e447ac75ab104f527", GitTreeState:"clean", BuildDate:"2019-08-21T22:32:40Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}

Logs Include the output of the command line when running eksctl. If possible, eksctl should be run with debug logs. For example: eksctl get clusters -v 4 Make sure you redact any sensitive information before posting. If the output is long, please consider a Gist.


11:30 $ eksctl get clusters -v 4
2019-10-31T11:47:15+08:00 [▶]  role ARN for the current session is "arn:aws:iam::12345678:user/bruno.medeiros"
NAME            REGION
internal-apps   us-east-1

该提问来源于开源项目:weaveworks/eksctl

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

5条回答

  • weixin_39954908 weixin_39954908 4月前

    Hi,

    I have same issue. did you find something? It makes me believe its something wrong with my VPC/Subnets.

    Thanks Sergiu Plotnicu

    点赞 评论 复制链接分享
  • weixin_39954908 weixin_39954908 4月前

    try to tag subnets/route_tables with "kubetnetes.io/cluster/{cluster_name}" = shared

    点赞 评论 复制链接分享
  • weixin_39562197 weixin_39562197 4月前

    I've just tried that, same problem.

    What seems to have worked for me is to create a separated route table and explicitly attach my EKS subnets to it, rather than leaving them in the default routing table. Error changed, I'm recreating my windows worker group and if I don't post here anymore it's because that was probably enough :)

    点赞 评论 复制链接分享
  • weixin_39865061 weixin_39865061 4月前

    I am having the same. Cluster on 1.16

    点赞 评论 复制链接分享
  • weixin_39842237 weixin_39842237 4月前

    Could you elaborate on what route table you're referring to? Also, did this end up resolving your problem?

    点赞 评论 复制链接分享

相关推荐