weixin_39728572
weixin_39728572
2021-01-08 08:01

Zipkin trace headers contain multiple spans

Describe the bug When enabling the tracing services in versions > 0.50 the Zipkin headers X-B3-Sampled, X-B3-Spanid, and X-B3-Traceid all appear to have multiple values concatenated into them.

Example responses from httpbin container:

On 0.40.2:


$ curl http://localhost:8080/httpbin/get
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Content-Length": "0",
    "Host": "localhost:8080",
    "User-Agent": "curl/7.54.0",
    "X-B3-Parentspanid": "5efc216a052d8f53",
    "X-B3-Sampled": "1",
    "X-B3-Spanid": "022c8f7c67c05af4",
    "X-B3-Traceid": "5efc216a052d8f53",
    "X-Envoy-Expected-Rq-Timeout-Ms": "3000",
    "X-Envoy-Original-Path": "/httpbin/get"
  },
  "origin": "172.21.0.3",
  "url": "http://localhost:8080/get"
}

On 0.51.2:


$ curl http://localhost:8080/httpbin/get
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Content-Length": "0",
    "Host": "localhost:8080",
    "User-Agent": "curl/7.54.0",
    "X-B3-Parentspanid": "31708df6d4f1b3a8",
    "X-B3-Sampled": "1,1",
    "X-B3-Spanid": "31708df6d4f1b3a8,ba3032afcb429cc9",
    "X-B3-Traceid": "31708df6d4f1b3a8,31708df6d4f1b3a8",
    "X-Envoy-Expected-Rq-Timeout-Ms": "3000",
    "X-Envoy-Internal": "true",
    "X-Envoy-Original-Path": "/httpbin/get"
  },
  "origin": "172.21.0.1",
  "url": "http://localhost:8080/get"
}

Trace JSON from zipkin from 0.51.2:

json
[
  {
    "traceId": "3d2e5cd9eadf6e66",
    "parentId": "3d2e5cd9eadf6e66",
    "id": "71f4e76b18114cbc",
    "kind": "CLIENT",
    "name": "router cluster_httpbin egress",
    "timestamp": 1553167127739170,
    "duration": 7395,
    "localEndpoint": {
      "serviceName": "ambassador-default",
      "ipv4": "172.21.0.4"
    },
    "tags": {
      "component": "proxy"
    }
  },
  {
    "traceId": "3d2e5cd9eadf6e66",
    "id": "3d2e5cd9eadf6e66",
    "kind": "CLIENT",
    "name": "localhost:8080",
    "timestamp": 1553167127738886,
    "duration": 7964,
    "localEndpoint": {
      "serviceName": "ambassador-default",
      "ipv4": "172.21.0.4"
    },
    "tags": {
      "component": "proxy",
      "downstream_cluster": "-",
      "guid:x-request-id": "e1891e18-ef0a-95a0-a85b-308bd5c67b62",
      "http.method": "GET",
      "http.protocol": "HTTP/1.1",
      "http.status_code": "200",
      "http.url": "http://localhost:8080/httpbin/get",
      "node_id": "test-id",
      "request_size": "0",
      "response_flags": "-",
      "response_size": "532",
      "upstream_cluster": "cluster_httpbin",
      "user_agent": "curl/7.54.0"
    }
  }
]

The X-B3-Spanid header contains the ids of both spans.

Confirmed the behavior is preset on the 0.50 releases and the 0.52 rc.

To Reproduce Can be reproduced in docker-compose setup as described here:

docker-compose.yaml:

yaml
version: '3.5'
services:
  ambassador:
    image: quay.io/datawire/ambassador:0.51.2
    ports:
    - 8080:80
    volumes:
    - ./config:/ambassador/ambassador-config
    environment:
    - AMBASSADOR_NO_KUBEWATCH=no_kubewatch
  httpbin:
    image: kennethreitz/httpbin
  zipkin:
    image: openzipkin/zipkin
    ports:
    - 9411:9411

config/ambassador.yaml:

yaml
---
apiVersion: ambassador/v1
kind: Module
name: ambassador
config: {}
---
apiVersion: ambassador/v1
kind: Mapping
name: http-bin
prefix: /httpbin/
service: httpbin
---
apiVersion: ambassador/v1
kind: TracingService
name: tracing
service: zipkin:9411
driver: zipkin

Then run:


$ docker-compose up -d
$ curl http://localhost:8080/httpbin/get

Expected behavior The X-B3-Spanid header should contain only the id of the "router cluster_httpbin egress" span, and the other X-B3-* headers should contain only a single value.

Versions (please complete the following information): - Ambassador: >= 0.50 - Kubernetes environment: AWS EKS (Kubernetes 1.11) and docker-compose

该提问来源于开源项目:datawire/ambassador

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

9条回答

  • weixin_39722070 weixin_39722070 4月前

    I ran into this, too. For me, setting start_child_span to False prevented the double header phenomenon. It also prevented Envoy from generating an egress span (naturally).

    This is the config in question. Currently it is hard coded.

    
    $ ag start_child_span
    ambassador/ambassador/envoy/v2/v2listener.py
    254:        od['config'] = { 'start_child_span': True }
    
    ambassador/ambassador/ir/ir.py
    165:            router_config['start_child_span'] = True
    

    I can't tell yet if this is a bug in envoy, or a bug in the way that ambassador configures envoy (perhaps there's a combination of configs that causes this behavior?)

    点赞 评论 复制链接分享
  • weixin_39951378 weixin_39951378 4月前

    Thank you for reporting it. It could be both. I need to look into the tracing implementation for the filter.

    点赞 评论 复制链接分享
  • weixin_39728572 weixin_39728572 4月前

    Might also be worth noting, if the request to ambassador contains B3 headers their values are also included in the list of values in the header:

    
    $ curl http://localhost:8080/httpbin/get \
    >     -H "X-B3-Traceid: 48485a3953bb6124" \
    >     -H "X-B3-Spanid: 48485a3953bb6124" \
    >     -H "X-B3-Parentspanid: 48485a3953bb6124" \
    >     -H "X-B3-Sampled: 1"
    {
      "args": {},
      "headers": {
        "Accept": "*/*",
        "Content-Length": "0",
        "Host": "localhost:8080",
        "User-Agent": "curl/7.54.0",
        "X-B3-Parentspanid": "48485a3953bb6124,48485a3953bb6124,d52803b3c8d0d532",
        "X-B3-Sampled": "1,1,1",
        "X-B3-Spanid": "48485a3953bb6124,d52803b3c8d0d532,6464401de91c742d",
        "X-B3-Traceid": "48485a3953bb6124,48485a3953bb6124,48485a3953bb6124",
        "X-Envoy-Expected-Rq-Timeout-Ms": "3000",
        "X-Envoy-Internal": "true",
        "X-Envoy-Original-Path": "/httpbin/get"
      },
      "origin": "172.21.0.1",
      "url": "http://localhost:8080/get"
    }
    
    点赞 评论 复制链接分享
  • weixin_39728572 weixin_39728572 4月前

    I did some poking at reproducing this with Envoy itself stumbled upon https://github.com/envoyproxy/envoy/pull/5019 which appears to fix this issue. I can confirm that on the commit prior to this change I see the same behavior as I do from Ambassador, and the behavior is fixed after that commit. It looks like there are ~25 commits in between that and the version Ambassador is using. I wonder if it would be preferable to point to v1.9.0 of envoy to get that change plus any other fixes made before the release.

    Let me know if you'd like the configuration for reproducing with Envoy.

    点赞 评论 复制链接分享
  • weixin_39728572 weixin_39728572 4月前

    Side note: It's currently not clear where the source for the version of Envoy being bundled is. I gather its somewhere in https://github.com/datawire/envoy, but I'm not sure what branch or which commit it's at as it appears you all stopped doing tags/releases at some point. It'd be nice if I could trace back to where the image was built.

    点赞 评论 复制链接分享
  • weixin_39722070 weixin_39722070 4月前

    Yeah, oddly enough, I'm currently setting up a local Envoy to make sense of this behavior. On the master branch, I can't reproduce.

    I saw issue 5019 too, but quickly/incorrectly assumed it was probably present in Ambassador's build, given the version string (1.9 dev something). But indeed it is unclear which commit this actually built from. I'm asking in slack.

    点赞 评论 复制链接分享
  • weixin_39896256 weixin_39896256 4月前

    Was running into this as well.

    点赞 评论 复制链接分享
  • weixin_39728572 weixin_39728572 4月前

    This appears to be fixed in 0.53.1 👍

    点赞 评论 复制链接分享
  • weixin_39951378 weixin_39951378 4月前

    Awesome, glad that it's fixed.

    点赞 评论 复制链接分享

相关推荐