duanbo6482 2018-01-23 08:32
浏览 81
已采纳

golang内置地图和字符串键的哈希冲突?

I wrote this function to generate random unique id's for my test cases:

func uuid(t *testing.T) string {
    uidCounterLock.Lock()
    defer uidCounterLock.Unlock()

    uidCounter++
    //return "[" + t.Name() + "|" + strconv.FormatInt(uidCounter, 10) + "]"
    return "[" + t.Name() + "|" + string(uidCounter) + "]"
}

var uidCounter int64 = 1
var uidCounterLock sync.Mutex

In order to test it, I generate a bunch of values from it in different goroutines, send them to the main thread, which puts the result in a map[string]int by doing map[v] = map[v] + 1. There is no concurrent access to this map, it's private to the main thread.

var seen = make(map[string]int)
for v := range ch {
    seen[v] = seen[v] + 1
    if count := seen[v]; count > 1 {
        fmt.Printf("Generated the same uuid %d times: %#v
", count, v)
    }
}

When I just cast the uidCounter to a string, I get a ton of collisions on a single key. When I use strconv.FormatInt, I get no collisions at all.

When I say a ton, I mean I just got 1115919 collisions for the value [TestUuidIsUnique|�] out of 2227980 generated values, i.e. 50% of the values collide on the same key. The values are not equal. I do always get the same number of collisions for the same source code, so at least it's somewhat deterministic, i.e. probably not related to race conditions.

I'm not surprised integer overflow in a rune would be an issue, but I'm nowhere near 2^31, and that wouldn't explain why the map thinks 50% of the values have the same key. Also, I wouldn't expect a hash collision to impact correctness, just performance, since I can iterate over the keys in a map, so the values are stored there somewhere.

In the output, all runes printed are 0xEFBFBD. It's the same number of bits as the highest valid unicode code point, but that doesn't really match either.

Generated the same uuid 2 times: "[TestUuidIsUnique|�]"
Generated the same uuid 3 times: "[TestUuidIsUnique|�]"
Generated the same uuid 4 times: "[TestUuidIsUnique|�]"
Generated the same uuid 5 times: "[TestUuidIsUnique|�]"
...
Generated the same uuid 2047 times: "[TestUuidIsUnique|�]"
Generated the same uuid 2048 times: "[TestUuidIsUnique|�]"
Generated the same uuid 2049 times: "[TestUuidIsUnique|�]"
...

What's going on here? Did the go authors assume that hash(a) == hash(b) implies a == b for strings? Or am I just missing something silly? go test -race isn't complaining either.

I'm on macOS 10.13.2, and go version go1.9.2 darwin/amd64.

展开全部

  • 写回答

1条回答 默认 最新

  • 普通网友 2018-01-23 08:45
    关注

    String conversion of an invalid rune returns a string containing the unicode replacement character: "�".

    Use the strconv package to convert an integer to text.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部