I've got a MySQL database that has some Murmur2 hashes (as unsigned 64bit ints) that were generated with the Percona UDF that comes with the Percona strand of MySQL database found here https://github.com/percona/build-test/blob/master/plugin/percona-udf/murmur_udf.cc
My problem is that now I need to generate these same hashes on the PHP side, but I can't seem to find or tweak anything existing to work/output the same output for the same input.
Things I've tried:
- Copying the C++ function from the Percona UDF into my forked version of this PHP extension that originally produced 32bit int hashes https://github.com/StirlingMarketingGroup/php_murmurhash. This almost worked, as in it compiled, but when I execute the function within PHP the apache server crashes with a segfault and I'm not familiar enough with C++ and PHP extensions to debug this
The segfault gets caused by me running this function
var_dump(murmurhash('Hello World'));
Which works fine normally when I downloaded https://github.com/kibae/php_murmurhash (the original, 32bit, hash producing extension) and followed the instructions, but once I replaced the function (Only edit in the MurmurHash2.cpp file to https://github.com/StirlingMarketingGroup/php_murmurhash/blob/master/MurmurHash2.cpp) the same function call crashes the PHP script.
- Trying to port the Percona UDF C++ function to PHP. I'm not super sure if my PHP function is 100% accurate with trying to account for the pointer incrementing but I suspect more so that the reason I get entirely different output with the PHP version has something to do with PHP not supporting unsigned integers.
Here is the PHP function that I've written as a port from the Percona C++ function
function murmurhash2(string $s) : int {
$len = strlen($s);
$seed = 0;
$m = 0x5bd1e995;
$r = 24;
$h1 = $seed ^ $len;
$h2 = 0;
$i = 0;
while ($len >= 8) {
$k1 = ord($s[$i++]);
$k1 *= $m; $k1 ^= $k1 >> $r; $k1 *= $m;
$h1 *= $m; $h1 ^= $k1;
$len -= 4;
$k2 = ord($s[$i++]);
$k2 *= $m; $k2 ^= $k2 >> $r; $k2 *= $m;
$h2 *= $m; $h2 ^= $k2;
$len -= 4;
}
if ($len >= 4) {
$k1 = ord($s[$i++]);
$k1 *= $m; $k1 ^= $k1 >> $r; $k1 *= $m;
$h1 *= $m; $h1 ^= $k1;
$len -= 4;
}
switch ($len) {
case 3: $h2 ^= ord($s[2]) << 16;
case 2: $h2 ^= ord($s[1]) << 8;
case 1: $h2 ^= ord($s[0]);
$h2 *= $m;
};
$h1 ^= $h2 >> 18; $h1 *= $m;
$h2 ^= $h1 >> 22; $h2 *= $m;
$h1 ^= $h2 >> 17; $h1 *= $m;
$h = $h1;
$h = ($h << 32) | $h2;
return $h;
}
Within MySQL I get this
select murmur_hash('Hello World'), cast(murmur_hash('Hello World')as unsigned), CONV(cast(murmur_hash('Hello World')as unsigned), 10, 16);
-- -8846466548632298438 9600277525077253178 853B098B6B655C3A
And in PHP I get
var_dump(murmurhash2('Hello World'));
// int(5969224437940092928)
So looking at the MySQL and PHP results, neither signed nor unsigned match my PHP output.
Is there something that can be fixed with either of my previous two approaches, or maybe an already working approach that I can use instead?