weixin_39606118
weixin_39606118
2020-11-22 04:36

Modify unsafe string to byte slice conversion in Sum64String

Hello, I think there might be a race condition in Sum64String which could be mitigated with a minor change. The gist of the issue is that Go's garbage collector can collect objects that are still in scope but no longer referenced (see, for example, this issue and this post on reddit). Consequently, Sum64String may suffer from the following race condition:


func Sum64String(s string) uint64 {
    var b []byte
    sh := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>
Go's garbage collector runs here and interrupts this function after the previous line.
It sees that `s` is not used after this point and determines that it is eligible for collection.
`sh` contains a `uintptr`, so the garbage collector does not consider this pointer to be a
live reference to an object on the heap. Consequently, the garbage collector determines
that there are no live references to the bytes that `s` pointed to and frees them.
>>>
    bh := (*reflect.SliceHeader)(unsafe.Pointer(&b))
    bh.Data = uintptr(unsafe.Pointer(sh.Data))
    bh.Len = sh.Len
    bh.Cap = sh.Len
    return Sum64(b)
}

I believe we can get around this problem by using s after we set b to point to the same bytes with the line:


l := len(s)

In this way the garbage collector will see that s is used after b's Data field is updated to point to the underlying bytes and the bytes must be live until them.

该提问来源于开源项目:cespare/xxhash

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

8条回答

  • weixin_39687189 weixin_39687189 5月前

    Hi ! Thanks for your concern :)

    Have you seen memory corruption, or is this just speculative? I'm pretty sure this code is fine, because there are some particular idioms around using SliceHeader that are treated specially. In particular, see rule 6 at https://golang.org/pkg/unsafe/#Pointer. So the compiler should understand the "pointer-ness" of the Data field and keep the string live even though it's only a uintptr reference.

    We can also try to force a collection and see if we get memory corruption; I was not able to show any: https://play.golang.org/p/2OYbE_HHrL (Of course this doesn't prove the code always works or will keep working with a future compiler version.)

    To be sure, I asked about this on an existing, related golang-nuts thread: https://groups.google.com/d/msg/golang-nuts/dcjzJy-bSpw/pf4tDTJ9BQAJ

    P.S. Thanks to this PR I noticed the unnecessary uintptr->unsafe.Pointer->uintptr conversion; fixed in 2a89cb3dcf30ef71636c5715ac65f385a27b54a3.

    P.P.S. For unsafe code where you need to keep a particular pointer live, you don't need to restructure the code at all; you can just be explicit and use runtime.KeepAlive.

    点赞 评论 复制链接分享
  • weixin_39606118 weixin_39606118 5月前

    Hi ! This is all speculative, my primary motivation in opening the PR was just to start the discussion :)

    I don't think the compiler recognizes the "pointerness" of uintptr though as the documentation for the unsafe package states:

    
    A uintptr is an integer, not a reference. Converting a Pointer to a uintptr creates an integer value with no pointer semantics. Even if a uintptr holds the address of some object, the garbage collector will not update that uintptr's value if the object moves, nor will that uintptr keep the object from being reclaimed.
    

    I've also taken the approach of trying to force the corruption by calling runtime.GC() in my own tests without any luck. Part of the problem I think is that for the corruption to manifest itself we would need the memory to be reallocated as I don't believe the GC zeroes memory that it marks as free (I'd have to dig up the code to support this but I feel that I recall seeing this somewhere).

    It's great that you asked on the thread though, I'll certainly be following along.

    Also, runtime.KeepAlive() is certainly an option, it requires an interface conversion though, so I figured we could save a few cycles by just getting the length of the string. I should probably add a comment though just to highlight the significance of the line.

    点赞 评论 复制链接分享
  • weixin_39687189 weixin_39687189 5月前

    To be clear, I'm not claiming it's safe to store pointers in uintptrs in general (it's certainly not). I'm saying that the Data field of reflect.StringHeader and reflect.SliceHeader is a special case, as outlined at https://golang.org/pkg/unsafe/#Pointer.

    点赞 评论 复制链接分享
  • weixin_39687189 weixin_39687189 5月前

    Heh, seems I was mistaken, given Ian's comments :)

    点赞 评论 复制链接分享
  • weixin_39606118 weixin_39606118 5月前

    Cool! Seems its a moot point at the moment since there is no point for the goroutine to be preempted, but interesting nonetheless.

    点赞 评论 复制链接分享
  • weixin_39606118 weixin_39606118 5月前

    I'm going to close this PR in favor of the solution offered on the thread. Thanks for following up!

    点赞 评论 复制链接分享
  • weixin_39687189 weixin_39687189 5月前

    I did end up going with your approach in 0c3aa314b221a64bec7eb9539116fb7bc12cbb18 since the KeepAlive approach was slow.

    Thanks very much for bringing this to my attention.

    点赞 评论 复制链接分享
  • weixin_39606118 weixin_39606118 5月前

    No problem!

    点赞 评论 复制链接分享

相关推荐