duandong1869 2018-02-02 12:21
浏览 95
已采纳

将Kafka的murmur2实现移植到Go

JVM clients for Kafka are using custom implementation of murmur2 hash for their default partitioner.

None of Kafka clients for Go implement this hashing algorithm, which bring all sorts of problems when you need to keep consistent partitioning between different clients on different platforms.

I'm trying to port this code to Go, and it seems to work for some of the values, but not for the others.

Here is the Java code (source is here: https://github.com/apache/kafka/blob/1.0.0/clients/src/main/java/org/apache/kafka/common/utils/Utils.java#L353 ):

public static int murmur2(final byte[] data) {
    int length = data.length;
    int seed = 0x9747b28c;
    // 'm' and 'r' are mixing constants generated offline.
    // They're not really 'magic', they just happen to work well.
    final int m = 0x5bd1e995;
    final int r = 24;

    // Initialize the hash to a random value
    int h = seed ^ length;
    int length4 = length / 4;

    for (int i = 0; i < length4; i++) {
        final int i4 = i * 4;
        int k = (data[i4 + 0] & 0xff) + ((data[i4 + 1] & 0xff) << 8) + ((data[i4 + 2] & 0xff) << 16) + ((data[i4 + 3] & 0xff) << 24);
        k *= m;
        k ^= k >>> r;
        k *= m;
        h *= m;
        h ^= k;
    }

    // Handle the last few bytes of the input array
    switch (length % 4) {
        case 3:
            h ^= (data[(length & ~3) + 2] & 0xff) << 16;
        case 2:
            h ^= (data[(length & ~3) + 1] & 0xff) << 8;
        case 1:
            h ^= data[length & ~3] & 0xff;
            h *= m;
    }

    h ^= h >>> 13;
    h *= m;
    h ^= h >>> 15;

    return h;
}

Here is the Go code (playground link: https://play.golang.org/p/K4VooLZ4Mp7):

package main

import "fmt"

func main() {
    cases := []struct {
        Input    []byte
        Expected int32
    }{
        {[]byte("21"), -973932308},
        {[]byte("foobar"), -790332482}, // outputs: 1518714010
        {[]byte("a-little-bit-long-string"), -985981536}, // outputs 2068422364
        {[]byte("a-little-bit-longer-string"), -1486304829}, // outputs 1797390322
        {[]byte("lkjh234lh9fiuh90y23oiuhsafujhadof229phr9h19h89h8"), -58897971}, // outputs -1332218133
        {[]byte{'a', 'b', 'c'}, 479470107},
    }

    for _, c := range cases {
        if res := murmur2(c.Input); res != c.Expected {
            fmt.Printf("input: %q, expected: %d, got: %d
", c.Input, c.Expected, res)
        }
    }
}

func murmur2(data []byte) int32 {
    length := int32(len(data))
    seed := uint32(0x9747b28c)
    m := int32(0x5bd1e995)
    r := uint32(24)

    h := int32(seed ^ uint32(length))
    length4 := length / 4

    for i := int32(0); i < length4; i++ {
        i4 := i * 4
        k := int32(data[i4+0]&0xff) + (int32(data[i4+1]&0xff) << 8) + (int32(data[i4+2]&0xff) << 16) + (int32(data[i4+3]&0xff) << 24)
        k ^= int32(uint32(k) >> r)
        k *= m
        h *= m
        h ^= k
    }

    switch length % 4 {
    case 3:
        h ^= int32(data[(length & ^3)+2]&0xff) << 16
        fallthrough
    case 2:
        h ^= int32(data[(length & ^3)+1]&0xff) << 8
        fallthrough
    case 1:
        h ^= int32(data[length & ^3] & 0xff)
        h *= m
    }

    h ^= int32(uint32(h) >> 13)
    h *= m
    h ^= int32(uint32(h) >> 15)

    return h
}

I generated expected values for Go tests from Java, using mentioned Utils class like so:

System.out.println(Utils.murmur2("a-little-bit-long-string".getBytes("UTF-8")))

None of the existing murmur2 implementations for Go I've seen are generating the same results as mentioned Java code.

The question is, how can I port mentioned code in Java to Go so that the result will be the same between both?

  • 写回答

1条回答 默认 最新

  • doushi5117 2018-02-02 13:21
    关注

    As pointed by @IskanderSharipov:

    Go version misses one multiplication statement: k *= m inside the loop

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 微带串馈天线阵列每个阵元宽度计算
  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥30 BC260Y用MQTT向阿里云发布主题消息一直错误
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM
  • ¥15 划分vlan后不通了
  • ¥20 用雷电模拟器安装百达屋apk一直闪退