weixin_39835965
weixin_39835965
2020-12-05 20:22

Use bucket histograms as well for collecting latency metrics

These aren't as accurate in terms of percentiles but aggregate much better across fleets. The Atlas metrics system inside Netflix also has support for bucketized metrics aggregation that we currently can't take advantage of because the data is not placed into fixed buckets.

This would help us get fleet-wide percentiles that are still approximations, but at least have some basis in sound math and statistics instead of our current average-of-95th method.

该提问来源于开源项目:Netflix/rend

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

4条回答

  • weixin_39765290 weixin_39765290 5月前

    Fast binary linear-log bucketing: https://github.com/dgryski/go-linlog

    点赞 评论 复制链接分享
  • weixin_39835965 weixin_39835965 5月前

    I'm considering using a count of leading zeros to get the bucket number to use, which will increase the buckets by power of two and be fast, as long as the count of leading zeros is fast. It's likely I'll be using this assembly implementation to do so:

    https://github.com/dgryski/go-bits/blob/master/clz_amd64.s

    点赞 评论 复制链接分享
  • weixin_39835965 weixin_39835965 5月前

    Actually LZCNT (https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT) might be simpler, since it does exactly what I need. It's a simpler implementation since it only uses a single instruction (though with the overhead of a function call still) and is supported by the hardware that we run on.

    点赞 评论 复制链接分享
  • weixin_39835965 weixin_39835965 5月前

    Turns out the processors we run on have POPCNT but not LZCNT (we apparently aren't on Haswell yet, boo) to BSRQ it is.

    点赞 评论 复制链接分享

相关推荐