duandong1869 2018-02-02 12:21
浏览 95
已采纳

将Kafka的murmur2实现移植到Go

JVM clients for Kafka are using custom implementation of murmur2 hash for their default partitioner.

None of Kafka clients for Go implement this hashing algorithm, which bring all sorts of problems when you need to keep consistent partitioning between different clients on different platforms.

I'm trying to port this code to Go, and it seems to work for some of the values, but not for the others.

Here is the Java code (source is here: https://github.com/apache/kafka/blob/1.0.0/clients/src/main/java/org/apache/kafka/common/utils/Utils.java#L353 ):

public static int murmur2(final byte[] data) {
    int length = data.length;
    int seed = 0x9747b28c;
    // 'm' and 'r' are mixing constants generated offline.
    // They're not really 'magic', they just happen to work well.
    final int m = 0x5bd1e995;
    final int r = 24;

    // Initialize the hash to a random value
    int h = seed ^ length;
    int length4 = length / 4;

    for (int i = 0; i < length4; i++) {
        final int i4 = i * 4;
        int k = (data[i4 + 0] & 0xff) + ((data[i4 + 1] & 0xff) << 8) + ((data[i4 + 2] & 0xff) << 16) + ((data[i4 + 3] & 0xff) << 24);
        k *= m;
        k ^= k >>> r;
        k *= m;
        h *= m;
        h ^= k;
    }

    // Handle the last few bytes of the input array
    switch (length % 4) {
        case 3:
            h ^= (data[(length & ~3) + 2] & 0xff) << 16;
        case 2:
            h ^= (data[(length & ~3) + 1] & 0xff) << 8;
        case 1:
            h ^= data[length & ~3] & 0xff;
            h *= m;
    }

    h ^= h >>> 13;
    h *= m;
    h ^= h >>> 15;

    return h;
}

Here is the Go code (playground link: https://play.golang.org/p/K4VooLZ4Mp7):

package main

import "fmt"

func main() {
    cases := []struct {
        Input    []byte
        Expected int32
    }{
        {[]byte("21"), -973932308},
        {[]byte("foobar"), -790332482}, // outputs: 1518714010
        {[]byte("a-little-bit-long-string"), -985981536}, // outputs 2068422364
        {[]byte("a-little-bit-longer-string"), -1486304829}, // outputs 1797390322
        {[]byte("lkjh234lh9fiuh90y23oiuhsafujhadof229phr9h19h89h8"), -58897971}, // outputs -1332218133
        {[]byte{'a', 'b', 'c'}, 479470107},
    }

    for _, c := range cases {
        if res := murmur2(c.Input); res != c.Expected {
            fmt.Printf("input: %q, expected: %d, got: %d
", c.Input, c.Expected, res)
        }
    }
}

func murmur2(data []byte) int32 {
    length := int32(len(data))
    seed := uint32(0x9747b28c)
    m := int32(0x5bd1e995)
    r := uint32(24)

    h := int32(seed ^ uint32(length))
    length4 := length / 4

    for i := int32(0); i < length4; i++ {
        i4 := i * 4
        k := int32(data[i4+0]&0xff) + (int32(data[i4+1]&0xff) << 8) + (int32(data[i4+2]&0xff) << 16) + (int32(data[i4+3]&0xff) << 24)
        k ^= int32(uint32(k) >> r)
        k *= m
        h *= m
        h ^= k
    }

    switch length % 4 {
    case 3:
        h ^= int32(data[(length & ^3)+2]&0xff) << 16
        fallthrough
    case 2:
        h ^= int32(data[(length & ^3)+1]&0xff) << 8
        fallthrough
    case 1:
        h ^= int32(data[length & ^3] & 0xff)
        h *= m
    }

    h ^= int32(uint32(h) >> 13)
    h *= m
    h ^= int32(uint32(h) >> 15)

    return h
}

I generated expected values for Go tests from Java, using mentioned Utils class like so:

System.out.println(Utils.murmur2("a-little-bit-long-string".getBytes("UTF-8")))

None of the existing murmur2 implementations for Go I've seen are generating the same results as mentioned Java code.

The question is, how can I port mentioned code in Java to Go so that the result will be the same between both?

  • 写回答

1条回答 默认 最新

  • doushi5117 2018-02-02 13:21
    关注

    As pointed by @IskanderSharipov:

    Go version misses one multiplication statement: k *= m inside the loop

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog