ocmwx 2025-06-22 16:37 采纳率: 0%
浏览 10

istio非扁平网络多控制面场景下的故障转移配置问题

操作环境:

【南京集群】

k8s版本
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3
k8s各宿主机地址段:192.168.110.0/24
service地址段=10.96.0.0/12
Pod地址段=10.244.0.0/16

istio版本
client version: 1.25.2
control plane version: 1.25.2
data plane version: 1.25.2 (4 proxies)

南京helloworld pod标签

kubectl get pods -n sample -o wide --show-labels
NAME                             READY   STATUS    RESTARTS   AGE   IP              NODE                        NOMINATED NODE   READINESS GATES   LABELS
helloworld-v1-86f57ccb45-mjwnf   2/2     Running   0          53m   10.244.134.55   k8s-node-01.k8s.cluster     <none>           <none>            app=helloworld,pod-template-hash=86f57ccb45,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=helloworld,service.istio.io/canonical-revision=v1,topology.istio.io/network=nj-k8s-cluster-network-01,topology.istio.io/subzone=qinhuai,topology.kubernetes.io/region=china-jiangsu,topology.kubernetes.io/zone=nanjing,version=v1
test-source-869888dfdc-9k6bt     2/2     Running   0          4d    10.244.220.54   k8s-master-01.k8s.cluster   <none>           <none>            app=test-source,pod-template-hash=869888dfdc,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=test-source,service.istio.io/canonical-revision=latest,topology.istio.io/network=nj-k8s-cluster-network-01,topology.istio.io/subzone=qinhuai,topology.kubernetes.io/region=china-jiangsu,topology.kubernetes.io/zone=nanjing

南京helloworld service

kubectl get service -n sample
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
curl         ClusterIP   10.96.18.117   <none>        80/TCP     36d
helloworld   ClusterIP   10.108.70.77   <none>        5000/TCP   2d20h

【北京集群相关配置】

k8s版本
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3
k8s各宿主机地址段:192.168.110.0/24
service地址段=10.112.0.0/12
Pod地址段=10.245.0.0/16

istio版本
client version: 1.25.2
control plane version: 1.25.2
data plane version: 1.25.2 (4 proxies)

配置一

【南京集群节点上的相关配置】

cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: helloworld-vs
  namespace: sample
spec:
  gateways:
    - mesh
    - istio-system/cross-network-gateway
  hosts:
    - "helloworld.sample.svc.cluster.local"
  http:
    - match:
        - port: 5000
      route:
        - destination:
            host: helloworld.sample.svc.cluster.local
            subset: to-nanjing-local-subsets
          weight: 50
        - destination:
            host: eastwestgateway.remote.cluster.global
            subset: to-beijing-eastwestgateway-subsets
          weight: 50
EOF
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: helloworld-dr
  namespace: sample
spec:
  host: "*"
  subsets:
    - name: to-nanjing-local-subsets
      labels:
        app: helloworld
        version: v1
        topology.istio.io/network: nj-k8s-cluster-network-01
      trafficPolicy:
        portLevelSettings:
          - port:
              number: 5000
            tls:
              mode: DISABLE
        loadBalancer:
          simple: ROUND_ROBIN  
          localityLbSetting:
            enabled: true
            distribute:
              - from: "*"
                to:
                  "china-beijing/beijing/eastwestgateway": 50
                  "china-jiangsu/nanjing/qinhuai": 50
    - name: to-beijing-eastwestgateway-subsets
      labels:
        region: china-beijing-beijing-eastwestgateway
        topology.istio.io/network: bj-k8s-cluster-network-01
      trafficPolicy:
        portLevelSettings:
          - port:
              number: 5000
            tls:
              mode: ISTIO_MUTUAL
              sni: helloworld.sample.svc.cluster.local
        loadBalancer:
          simple: ROUND_ROBIN  
          localityLbSetting:
            enabled: true
            distribute:
              - from: "*"
                to:
                  "china-beijing/beijing/eastwestgateway": 50
                  "china-jiangsu/nanjing/qinhuai": 50
  trafficPolicy:
    outlierDetection:
      consecutiveGatewayErrors: 3
      consecutive5xxErrors: 3
      interval: 10s
      baseEjectionTime: 240s
      maxEjectionPercent: 100
EOF
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: to-bj-eastwest-gateway-se
  namespace: sample
spec:
  hosts:
    - "eastwestgateway.remote.cluster.global"
  ports:
    - number: 5000
      name: https-5000
      protocol: HTTPS
  resolution: STATIC
  location: MESH_EXTERNAL
  endpoints:
    - address: 192.168.110.230
      ports:
        https-5000: 35443
      locality: "china-beijing/beijing/eastwestgateway"
      labels:
        region: china-beijing-beijing-eastwestgateway
        topology.istio.io/network: bj-k8s-cluster-network-01
EOF

【北京集群节点上的相关配置】

cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: cross-network-gateway
  namespace: istio-system
spec:
  selector:
    app: istio-eastwestgateway
    topology.istio.io/network: bj-k8s-cluster-network-01
  servers:
    - port:
        number: 15443
        name: https-15443
        protocol: HTTPS
      tls:
        mode: MUTUAL
        credentialName: istio-eastwestgateway-certs
        minProtocolVersion: TLSV1_2
      hosts:
        - "helloworld.sample.svc.cluster.local"
        - "192.168.110.210"
EOF
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: helloworld-vs
  namespace: istio-system
spec:
  hosts:
    - "helloworld.sample.svc.cluster.local"
    - "192.168.110.210"
  gateways:
    - istio-system/cross-network-gateway
  http:
    - route:
        - destination:
            host: helloworld.sample.svc.cluster.local
            port:
                number: 5000
EOF

在上述配置基础上,我在南京集群上的test-source-869888dfdc-9k6bt测试Pod发起测试请求http://helloworld.sample.svc.cluster.local:5000/hello,如下

while true; do
  kubectl exec "$(kubectl get pods -n sample -l app=test-source -o jsonpath='{.items[0].metadata.name}')" -n sample -c test-source -- curl -s http://helloworld.sample.svc.cluster.local:5000/hello
  sleep 1
done

结果返回

Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf
Hello version: v1, instance: helloworld-v1-86f57ccb45-mjwnf
Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf
Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf
Hello version: v1, instance: helloworld-v1-86f57ccb45-mjwnf
Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf
Hello version: v1, instance: helloworld-v1-86f57ccb45-mjwnf
Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf

结果是按名称为helloworld-vs的virtualservice一样,按权重50%分配请求的。

问题一:假如模拟南京集群中的helloworld故障,我该如何实现排除故障节点的调度?

kubectl scale deployment helloworld-v1 -n sample --replicas=0

此时测试结果返回如下:
no healthy upstream
Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf
no healthy upstream
Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf
no healthy upstream
Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf
no healthy upstream
no healthy upstream
no healthy upstream
no healthy upstream
Hello version: v2, instance: helloworld-v2-6746879bdd-hpftf

就是说当南京集群中的helloworld故障,下次就不用调度到此节点,全部流量都调度到北京

- destination:
            host: eastwestgateway.remote.cluster.global
            subset: to-beijing-eastwestgateway-subsets
          weight: 50

  • 写回答

4条回答 默认 最新

  • 阿里嘎多学长 2025-06-22 16:38
    关注

    阿里嘎多学长整理AIGC生成,因移动端显示问题导致当前答案未能完全显示,请使用PC端查看更加详细的解答过程

    istio非扁平网络多控制面场景下的故障转移配置问题

    你遇到了在非扁平网络多控制面场景下的istio故障转移配置问题。下面是可能的解决方案:

    1. 检查控制面配置:确保每个控制面都配置了正确的 peer 信息,使用 istioctl manifest generate 命令生成 manifest 文件,然后检查每个控制面配置的 peer 信息是否正确。
    2. 检查网络配置:确保每个控制面之间的网络连接是否正确,使用 kubectl get pod -n istio-system 命令检查每个控制面 pod 的网络状态。
    3. 使用 istioctl manifest apply:使用 istioctl manifest apply 命令将 manifest 文件应用于每个控制面,以确保每个控制面都配置了正确的 peer 信息。
    4. 检查 istio.VERSION:确保每个控制面都使用相同的 istio 版本,使用 istioctl version 命令检查每个控制面 istio 版本。
    5. 使用 istioctl debug:使用 istioctl debug 命令检查每个控制面日志,以了解故障转移过程中的错误信息。

    如果以上解决方案不能解决问题,可以提供更多的日志信息和配置文件,以便更好地 debug 和解决问题。

    评论

报告相同问题?

问题事件

  • 修改了问题 6月22日
  • 修改了问题 6月22日
  • 修改了问题 6月22日
  • 创建了问题 6月22日