JVM clients for Kafka are using custom implementation of murmur2 hash for their default partitioner.
None of Kafka clients for Go implement this hashing algorithm, which bring all sorts of problems when you need to keep consistent partitioning between different clients on different platforms.
I'm trying to port this code to Go, and it seems to work for some of the values, but not for the others.
Here is the Java code (source is here: https://github.com/apache/kafka/blob/1.0.0/clients/src/main/java/org/apache/kafka/common/utils/Utils.java#L353 ):
public static int murmur2(final byte[] data) {
int length = data.length;
int seed = 0x9747b28c;
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.
final int m = 0x5bd1e995;
final int r = 24;
// Initialize the hash to a random value
int h = seed ^ length;
int length4 = length / 4;
for (int i = 0; i < length4; i++) {
final int i4 = i * 4;
int k = (data[i4 + 0] & 0xff) + ((data[i4 + 1] & 0xff) << 8) + ((data[i4 + 2] & 0xff) << 16) + ((data[i4 + 3] & 0xff) << 24);
k *= m;
k ^= k >>> r;
k *= m;
h *= m;
h ^= k;
}
// Handle the last few bytes of the input array
switch (length % 4) {
case 3:
h ^= (data[(length & ~3) + 2] & 0xff) << 16;
case 2:
h ^= (data[(length & ~3) + 1] & 0xff) << 8;
case 1:
h ^= data[length & ~3] & 0xff;
h *= m;
}
h ^= h >>> 13;
h *= m;
h ^= h >>> 15;
return h;
}
Here is the Go code (playground link: https://play.golang.org/p/K4VooLZ4Mp7):
package main
import "fmt"
func main() {
cases := []struct {
Input []byte
Expected int32
}{
{[]byte("21"), -973932308},
{[]byte("foobar"), -790332482}, // outputs: 1518714010
{[]byte("a-little-bit-long-string"), -985981536}, // outputs 2068422364
{[]byte("a-little-bit-longer-string"), -1486304829}, // outputs 1797390322
{[]byte("lkjh234lh9fiuh90y23oiuhsafujhadof229phr9h19h89h8"), -58897971}, // outputs -1332218133
{[]byte{'a', 'b', 'c'}, 479470107},
}
for _, c := range cases {
if res := murmur2(c.Input); res != c.Expected {
fmt.Printf("input: %q, expected: %d, got: %d
", c.Input, c.Expected, res)
}
}
}
func murmur2(data []byte) int32 {
length := int32(len(data))
seed := uint32(0x9747b28c)
m := int32(0x5bd1e995)
r := uint32(24)
h := int32(seed ^ uint32(length))
length4 := length / 4
for i := int32(0); i < length4; i++ {
i4 := i * 4
k := int32(data[i4+0]&0xff) + (int32(data[i4+1]&0xff) << 8) + (int32(data[i4+2]&0xff) << 16) + (int32(data[i4+3]&0xff) << 24)
k ^= int32(uint32(k) >> r)
k *= m
h *= m
h ^= k
}
switch length % 4 {
case 3:
h ^= int32(data[(length & ^3)+2]&0xff) << 16
fallthrough
case 2:
h ^= int32(data[(length & ^3)+1]&0xff) << 8
fallthrough
case 1:
h ^= int32(data[length & ^3] & 0xff)
h *= m
}
h ^= int32(uint32(h) >> 13)
h *= m
h ^= int32(uint32(h) >> 15)
return h
}
I generated expected values for Go tests from Java, using mentioned Utils
class like so:
System.out.println(Utils.murmur2("a-little-bit-long-string".getBytes("UTF-8")))
None of the existing murmur2 implementations for Go I've seen are generating the same results as mentioned Java code.
The question is, how can I port mentioned code in Java to Go so that the result will be the same between both?