ocmwx 2025-07-21 11:49 采纳率: 0%
浏览 6

如何实现istio非扁平网络多控制面场景下的故障转移

操作环境介绍
【南京集群】
k8s版本
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3
k8s各宿主机地址段:192.168.110.0/24
service地址段=10.96.0.0/12
Pod地址段=10.244.0.0/16
istio版本
client version: 1.25.2
control plane version: 1.25.2
data plane version: 1.25.2 (4 proxies)
南京集群节点标签

kubectl get node --show-labels
NAME                                STATUS     ROLES    AGE   VERSION   LABELS
nanjing-k8s-master-01.k8s.cluster   Ready      <none>   9d    v1.32.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=nanjing-k8s-master-01.k8s.cluster,kubernetes.io/os=linux,topology.istio.io/subzone=qinhuai,topology.kubernetes.io/region=cn-east-jiangsu-1,topology.kubernetes.io/zone=cn-east-jiangsu-1-nanjing
nanjing-k8s-master-02.k8s.cluster   NotReady   <none>   9d    v1.32.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=nanjing-k8s-master-02.k8s.cluster,kubernetes.io/os=linux,topology.istio.io/subzone=qinhuai,topology.kubernetes.io/region=cn-east-jiangsu-1,topology.kubernetes.io/zone=cn-east-jiangsu-1-nanjing
nanjing-k8s-node-01.k8s.cluster     Ready      <none>   9d    v1.32.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=nanjing-k8s-node-01.k8s.cluster,kubernetes.io/os=linux,topology.istio.io/subzone=qinhuai,topology.kubernetes.io/region=cn-east-jiangsu-1,topology.kubernetes.io/zone=cn-east-jiangsu-1-nanjing
nanjing-k8s-node-02.k8s.cluster     Ready      <none>   9d    v1.32.3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=nanjing-k8s-node-02.k8s.cluster,kubernetes.io/os=linux,topology.istio.io/subzone=qinhuai,topology.kubernetes.io/region=cn-east-jiangsu-1,topology.kubernetes.io/zone=cn-east-jiangsu-1-nanjing

南京集群helloworld pod标签 和 测试pod标签

kubectl get pods -n sample -o wide --show-labels
NAME                            READY   STATUS    RESTARTS   AGE     IP              NODE                              NOMINATED NODE   READINESS GATES   LABELS
helloworld-v1-897f85bff-mzmc4   2/2     Running   0          4m12s   10.244.156.26   nanjing-k8s-node-01.k8s.cluster   <none>           <none>            app=helloworld,pod-template-hash=897f85bff,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=helloworld,service.istio.io/canonical-revision=v1,topology.istio.io/network=nanjing-k8s-cluster-network-01,topology.istio.io/subzone=qinhuai,topology.kubernetes.io/region=cn-east-jiangsu-1,topology.kubernetes.io/zone=cn-east-jiangsu-1-nanjing,version=v1
test-source-746ff5d774-zxtcl    2/2     Running   0          41h     10.244.218.19   nanjing-k8s-node-02.k8s.cluster   <none>           <none>            app=test-source,pod-template-hash=746ff5d774,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=test-source,service.istio.io/canonical-revision=latest,topology.istio.io/network=nanjing-k8s-cluster-network-01,topology.istio.io/subzone=qinhuai,topology.kubernetes.io/region=cn-east-jiangsu-1,topology.kubernetes.io/zone=cn-east-jiangsu-1-nanjing

南京集群helloworld pod service

kubectl get service -n sample
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
helloworld   ClusterIP   10.97.80.235   <none>        5000/TCP   8d

【徐州集群相关配置】
k8s版本
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3
k8s各宿主机地址段:192.168.110.0/24
service地址段=10.112.0.0/12
Pod地址段=10.245.0.0/16
istio版本
client version: 1.25.2
control plane version: 1.25.2
data plane version: 1.25.2 (4 proxies)
徐州集群istio pod

kubectl get pods -n istio-system -o wide
NAME                                     READY   STATUS    RESTARTS   AGE   IP              NODE                               NOMINATED NODE   READINESS GATES
istio-eastwestgateway-79d7d7f4b6-rsm7k   1/1     Running   0          40h   10.245.204.30   xuzhou-k8s-node-01.k8s.cluster     <none>           <none>
istio-ingressgateway-748b74db4-6cvtz     1/1     Running   0          35h   10.245.146.65   xuzhou-k8s-master-01.k8s.cluster   <none>           <none>
istiod-679d98dfc8-cl6pj                  1/1     Running   0          40h   10.245.146.64   xuzhou-k8s-master-01.k8s.cluster   <none>           <none>

徐州集群istio pod service

kubectl get service -n istio-system 
NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                           AGE
istio-eastwestgateway   NodePort       10.124.167.125   <none>        15021:35021/TCP,15443:35443/TCP,15012:35012/TCP,15017:35017/TCP   8d
istio-ingressgateway    LoadBalancer   10.116.241.102   <pending>     15021:31410/TCP,80:31740/TCP,443:35680/TCP                        8d
istiod                  ClusterIP      10.122.12.160    <none>        15010/TCP,15012/TCP,443/TCP,15014/TCP  

徐州集群helloworld pod和helloworld pod service

kubectl get pods,service -n sample -o wide --show-labels
NAME                                 READY   STATUS    RESTARTS   AGE   IP              NODE                             NOMINATED NODE   READINESS GATES   LABELS
pod/helloworld-v2-7b5ccff4cd-q8dgn   2/2     Running   0          44h   10.245.204.31   xuzhou-k8s-node-01.k8s.cluster   <none>           <none>            app=helloworld,pod-template-hash=7b5ccff4cd,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=helloworld,service.istio.io/canonical-revision=v2,topology.istio.io/network=xuzhou-k8s-cluster-network-01,topology.istio.io/subzone=yunlong,topology.kubernetes.io/region=cn-east-jiangsu-2,topology.kubernetes.io/zone=cn-east-1-jiangsu-2-xuzhou,version=v2
pod/test-source-64c49bc79c-xf7w6     2/2     Running   0          44h   10.245.204.32   xuzhou-k8s-node-01.k8s.cluster   <none>           <none>            app=test-source,pod-template-hash=64c49bc79c,security.istio.io/tlsMode=istio,service.istio.io/canonical-name=test-source,service.istio.io/canonical-revision=latest,topology.istio.io/network=xuzhou-k8s-cluster-network-01,topology.istio.io/subzone=yunlong,topology.kubernetes.io/region=cn-east-jiangsu-2,topology.kubernetes.io/zone=cn-east-1-jiangsu-2-xuzhou

NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE   SELECTOR         LABELS
service/helloworld   ClusterIP   10.113.28.83   <none>        5000/TCP   8d    app=helloworld   app=helloworld,service=helloworld

###############################################################################################################################################################################################################################################

【南京集群节点这则的相关配置】

cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: helloworld-vs
  namespace: sample
spec:
  gateways:
    - mesh
    - istio-system/cross-network-gateway
  hosts:
    - "helloworld.sample.svc.cluster.local"
  http:
    - match:
        - port: 5000
      route:
        - destination:
            host: helloworld.sample.svc.cluster.local
            subset: to-nanjing-local-subsets
          weight: 50
        - destination:
            host: eastwestgateway.remote.cluster.global
            subset: to-xuzhou-eastwestgateway-subsets
          weight: 50
EOF
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: helloworld-dr-to-nanjing-local
  namespace: sample
spec:
  host: "helloworld.sample.svc.cluster.local"
  subsets:
    - name: to-nanjing-local-subsets
      labels:
        topology.kubernetes.io/region: cn-east-jiangsu-1
        topology.kubernetes.io/zone: cn-east-jiangsu-1-nanjing
        topology.istio.io/network: nanjing-k8s-cluster-network-01
        app: helloworld
        version: v1
      trafficPolicy:
        portLevelSettings:
          - port:
              number: 5000
            tls:
              mode: DISABLE
        loadBalancer:
          simple: ROUND_ROBIN  
          localityLbSetting:
            enabled: true
            failover:
              - from: "cn-east-jiangsu-1"
                to:
                  "cn-east-jiangsu-2"
  trafficPolicy:
    outlierDetection:
      splitExternalLocalOriginErrors: true
      consecutiveLocalOriginFailures: 3
      consecutiveGatewayErrors: 3
      interval: 10s
      baseEjectionTime: 3600s
      maxEjectionPercent: 100
EOF
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: helloworld-dr-to-xuzhou-eastwestgateway
  namespace: sample
spec:
  host: "eastwestgateway.remote.cluster.global"
  subsets:
    - name: to-xuzhou-eastwestgateway-subsets
      labels:
        topology.kubernetes.io/region: cn-east-jiangsu-2
        topology.kubernetes.io/zone: cn-east-jiangsu-2-xuzhou
        topology.istio.io/subzone: eastwestgateway
        topology.istio.io/network: xuzhou-k8s-cluster-network-01
        app: istio-eastwestgateway-xuzhou
      trafficPolicy:
        portLevelSettings:
          - port:
              number: 5000
            tls:
              mode: ISTIO_MUTUAL
              sni: helloworld.sample.svc.cluster.local
        loadBalancer:
          simple: ROUND_ROBIN
          localityLbSetting:
            enabled: true
            failover:
              - from: "cn-east-jiangsu-2"
                to:
                  "cn-east-jiangsu-1"
  trafficPolicy:
    outlierDetection:
      splitExternalLocalOriginErrors: true
      consecutiveLocalOriginFailures: 3
      consecutiveGatewayErrors: 3
      interval: 10s
      baseEjectionTime: 3600s
      maxEjectionPercent: 100
EOF
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: ServiceEntry
metadata:
  name: to-xuzhou-eastwest-gateway-se
  namespace: sample
spec:
  hosts:
    - "eastwestgateway.remote.cluster.global"
  ports:
    - number: 5000
      name: https-5000
      protocol: HTTPS
  resolution: STATIC
  location: MESH_EXTERNAL
  endpoints:
    - address: 192.168.110.230
      ports:
        https-5000: 35443
      locality: "cn-east-jiangsu-2/cn-east-jiangsu-2-xuzhou/eastwestgateway"
      labels:
        topology.kubernetes.io/region: cn-east-jiangsu-2
        topology.kubernetes.io/zone: cn-east-jiangsu-2-xuzhou
        topology.istio.io/subzone: eastwestgateway
        topology.istio.io/network: xuzhou-k8s-cluster-network-01
        app: istio-eastwestgateway-xuzhou
EOF

【徐州集群节点这则的相关配置】

cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
  name: cross-network-gateway
  namespace: istio-system
spec:
  selector:
    app: istio-eastwestgateway
    topology.istio.io/network: xuzhou-k8s-cluster-network-01
  servers:
    - port:
        number: 15443
        name: https-15443
        protocol: HTTPS
      tls:
        mode: MUTUAL
        credentialName: istio-eastwestgateway-certs
        minProtocolVersion: TLSV1_2
      hosts:
        - "helloworld.sample.svc.cluster.local"
        - "192.168.110.210"
EOF
cat <<EOF | kubectl apply -f -
---
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
  name: helloworld-vs
  namespace: istio-system
spec:
  hosts:
    - "helloworld.sample.svc.cluster.local"
    - "192.168.110.210"
  gateways:
    - istio-system/cross-network-gateway
  http:
    - route:
        - destination:
            host: helloworld.sample.svc.cluster.local
            port:
                number: 5000
EOF

#################################################################################################################################################################################################################

至此配置完成,开始测试

测试脚本

while true; do   kubectl exec "$(kubectl get pods -n sample -l app=test-source -o jsonpath='{.items[0].metadata.name}')" -n sample -c test-source -- curl -s http://helloworld.sample.svc.cluster.local:5000/hello;   sleep 1; done

Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v1, instance: helloworld-v1-897f85bff-mzmc4
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v1, instance: helloworld-v1-897f85bff-mzmc4
Hello version: v1, instance: helloworld-v1-897f85bff-mzmc4
Hello version: v1, instance: helloworld-v1-897f85bff-mzmc4
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v1, instance: helloworld-v1-897f85bff-mzmc4
Hello version: v1, instance: helloworld-v1-897f85bff-mzmc4
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v1, instance: helloworld-v1-897f85bff-mzmc4
Hello version: v1, instance: helloworld-v1-897f85bff-mzmc4

模拟南京集群节点Hello v1 pod故障

kubectl scale deployment helloworld-v1 -n sample --replicas=0

再次测试脚本

while true; do   kubectl exec "$(kubectl get pods -n sample -l app=test-source -o jsonpath='{.items[0].metadata.name}')" -n sample -c test-source -- curl -s http://helloworld.sample.svc.cluster.local:5000/hello;   sleep 1; done

Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
no healthy upstreamHello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
no healthy upstreamHello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
no healthy upstreamHello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
no healthy upstreamHello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
Hello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn
no healthy upstreamno healthy upstreamHello version: v2, instance: helloworld-v2-7b5ccff4cd-q8dgn

为什么这里会显示no healthy upstream?
感觉这里的failover配置没有错误切换

---
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: helloworld-dr-to-nanjing-local
  namespace: sample
spec:
  host: "helloworld.sample.svc.cluster.local"
  subsets:
    - name: to-nanjing-local-subsets

我的需求
根据上述配置,我该如何实现当请求http://helloworld.sample.svc.cluster.local:5000/hello时,首先调度到南京集群的helloworld pod节点,只有当南京集群的helloworld pod节点故障后,failover到徐州的东西向网关并访问徐州集群的helloworld pod

  • 写回答

5条回答 默认 最新

  • 檀越@新空间 2025-07-21 11:50
    关注

    上午好☀️☀️☀️️
    本答案参考通义千问

    Istio 非扁平网络多控制面场景 下实现 故障转移(Failover),需要结合 Istio 的 多集群架构、网关配置、服务发现、流量管理 以及 网络策略 等能力。以下是一个详细的解决方案:


    1. 环境概述

    • 南京集群

      • K8s 版本:v1.32.3
      • Pod 地址段:10.244.0.0/16
      • Service 地址段:10.96.0.0/12
      • Istio 版本:1.25.2
      • 节点标签包含 topology.kubernetes.io/zone, topology.istio.io/subzone, topology.istio.io/network
    • 北京集群

      • K8s 版本:v1.32.3
      • Pod 地址段:10.245.0.0/16
      • Service 地址段:10.112.0.0/12
      • Istio 版本:1.25.2
      • 包含 East-West Gateway(用于跨集群通信)

    2. 故障转移目标

    • 在南京集群的某个节点或 Pod 出现故障时,自动将流量切换到北京集群
    • 确保服务高可用性,避免单点故障。

    3. 实现步骤

    3.1 配置多集群 Istio 架构

    关键组件:

    • East-West Gateway(跨集群通信)
    • Global Mesh Configuration(统一配置)
    • ServiceEntryDestinationRule(服务发现与路由)

    步骤:

    1. 在两个集群中部署 Istio 控制平面

      • 南京集群和北京集群各自部署独立的 Istio 控制面(如 istiod),并确保它们能够互相通信。
    2. 配置 East-West Gateway

      • 在北京集群中部署 East-West Gateway,并配置其监听端口,允许从南京集群访问。
      apiVersion: networking.istio.io/v1beta1
      kind: Gateway
      metadata:
        name: eastwest-gateway
        namespace: istio-system
      spec:
        selector:
          istio: eastwestgateway
        servers:
        - port:
            number: 15443
            protocol: HTTPS
            name: https-eastwest
          tls:
            mode: MUTUAL
            serverCertificate: /etc/istio/certs/tls.crt
            privateKey: /etc/istio/certs/tls.key
            caCertificates: /etc/istio/certs/ca.crt
          hosts:
          - "istio-eastwestgateway.istio-system.svc.cluster.local"
      
    3. 创建跨集群的 ServiceEntry

      • 在南京集群中定义一个 ServiceEntry,指向北京集群的服务地址。
      apiVersion: networking.istio.io/v1beta1
      kind: ServiceEntry
      metadata:
        name: cross-cluster-service
        namespace: sample
      spec:
        hosts:
        - "helloworld.sample.svc.cluster.local"
        addresses:
        - "10.112.0.100"  # 北京集群中的 helloworld 服务 IP
        ports:
        - number: 5000
          name: http
          protocol: HTTP
        location: MESH_EXTERNAL
        resolution: DNS
      

    3.2 配置故障转移策略

    步骤:

    1. 使用 DestinationRule 定义故障转移规则

      apiVersion: networking.istio.io/v1beta1
      kind: DestinationRule
      metadata:
        name: helloworld-destination
        namespace: sample
      spec:
        host: helloworld.sample.svc.cluster.local
        trafficPolicy:
          loadBalancer:
            consistentHash:
              useCookie:
                name: "session-cookie"
            failover:
              policy: "ON_FAILURE"
              destinations:
              - host: "helloworld.sample.svc.cluster.local"
                subset: "v1"
              - host: "helloworld.cross-cluster.svc.cluster.local"
                subset: "v1"
      

      说明: 通过 failover 策略,当主服务不可用时,自动切换到备用服务(即北京集群的服务)。

    2. 为每个服务设置健康检查(Liveness/Readiness Probe)

      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 10
        periodSeconds: 5
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5
      

      作用: 当 Pod 不健康时,Kubernetes 会自动重启或调度到其他节点。


    3.3 使用 Istio 的 Multi-Cluster Traffic Management

    步骤:

    1. 配置 Global Mesh Configuration

      • 在两个集群中共享 Istio 配置,例如 MeshConfigServiceMeshMemberRoll
      apiVersion: config.istio.io/v1alpha2
      kind: MeshConfig
      metadata:
        name: default
      spec:
        accessControl:
          enable: true
        configSources:
        - name: cluster-1
          type: KUBE_CLUSTER
          kubeconfig: /etc/istio/kubeconfig-cluster-1.yaml
        - name: cluster-2
          type: KUBE_CLUSTER
          kubeconfig: /etc/istio/kubeconfig-cluster-2.yaml
      
    2. 配置跨集群的 VirtualService

      apiVersion: networking.istio.io/v1beta1
      kind: VirtualService
      metadata:
        name: helloworld-virtualservice
        namespace: sample
      spec:
        hosts:
        - "helloworld.sample.svc.cluster.local"
        gateways:
        - public-gateway
        http:
        - route:
          - destination:
              host: helloworld.sample.svc.cluster.local
              port:
                number: 5000
          timeout: 10s
          retries:
            attempts: 3
            perTryTimeout: 2s
          fault:
            delay:
              percentage:
                value: 0
            abort:
              percentage:
                value: 0
      

      说明: 通过 retriesfault 设置,增强系统容错能力。


    3.4 网络隔离与故障恢复机制

    步骤:

    1. 使用 NetworkPolicy 进行网络隔离

      • 在 Kubernetes 中配置 NetworkPolicy,限制某些 Pod 的通信范围。
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-helloworld
        namespace: sample
      spec:
        podSelector: {}
        policyTypes:
        - Ingress
        ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                name: sample
      
    2. 配置故障恢复机制(如 HPA 或 VPA)

      • 使用 Kubernetes 的 Horizontal Pod Autoscaler (HPA)Vertical Pod Autoscaler (VPA) 自动扩展资源。
      apiVersion: autoscaling/v2
      kind: HorizontalPodAutoscaler
      metadata:
        name: helloworld-autoscaler
        namespace: sample
      spec:
        scaleTargetRef:
          apiVersion: apps/v1
          kind: Deployment
          name: helloworld-v1
        minReplicas: 2
        maxReplicas: 10
        metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 80
      

    4. 故障转移验证

    步骤:

    1. 模拟南京集群故障

      • 停止南京集群中的 helloworld Pod 或节点。
      kubectl delete pod -n sample helloworld-v1-897f85bff-mzmc4
      
    2. 检查流量是否转移到北京集群

      • 使用 curlkubectl exec 检查流量是否被正确路由到北京集群的服务。
      kubectl exec -n sample -it test-source-746ff5d774-zxtcl -- curl http://helloworld.sample.svc.cluster.local:5000
      
    3. 查看 Istio 的日志和监控数据

      • 使用 istioctl 查看流量分布和故障转移状态。
      istioctl proxy-config clusters <pod-name>
      

    5. 总结

    | 步骤 | 内容 | |------|------| | 1 | 配置多集群 Istio 架构(East-West Gateway) | | 2 | 配置 DestinationRule 实现故障转移 | | 3 | 配置 VirtualService 实现流量控制 | | 4 | 使用 NetworkPolicy 进行网络隔离 | | 5 | 配置 HPA/VPA 实现自动扩容 | | 6 | 验证故障转移逻辑 |

    重点总结:

    • Istio 的多集群支持 是实现跨集群故障转移的基础;
    • DestinationRule + VirtualService 提供了灵活的流量控制能力;
    • Health Check + HPA/VPA 是保障高可用性的关键手段;
    • East-West Gateway 是跨集群通信的核心组件。

    如有更多关于 Istio 多集群配置、网络策略、故障恢复机制 的问题,欢迎继续提问!

    评论

报告相同问题?

问题事件

  • 修改了问题 7月21日
  • 创建了问题 7月21日