dtwzwmv87399 2018-07-25 18:35
浏览 221
已采纳

如何在PHP 7.2中生成64位Murmur哈希v2?

I've got a MySQL database that has some Murmur2 hashes (as unsigned 64bit ints) that were generated with the Percona UDF that comes with the Percona strand of MySQL database found here https://github.com/percona/build-test/blob/master/plugin/percona-udf/murmur_udf.cc

My problem is that now I need to generate these same hashes on the PHP side, but I can't seem to find or tweak anything existing to work/output the same output for the same input.

Things I've tried:

  1. Copying the C++ function from the Percona UDF into my forked version of this PHP extension that originally produced 32bit int hashes https://github.com/StirlingMarketingGroup/php_murmurhash. This almost worked, as in it compiled, but when I execute the function within PHP the apache server crashes with a segfault and I'm not familiar enough with C++ and PHP extensions to debug this

The segfault gets caused by me running this function

var_dump(murmurhash('Hello World'));

Which works fine normally when I downloaded https://github.com/kibae/php_murmurhash (the original, 32bit, hash producing extension) and followed the instructions, but once I replaced the function (Only edit in the MurmurHash2.cpp file to https://github.com/StirlingMarketingGroup/php_murmurhash/blob/master/MurmurHash2.cpp) the same function call crashes the PHP script.

  1. Trying to port the Percona UDF C++ function to PHP. I'm not super sure if my PHP function is 100% accurate with trying to account for the pointer incrementing but I suspect more so that the reason I get entirely different output with the PHP version has something to do with PHP not supporting unsigned integers.

Here is the PHP function that I've written as a port from the Percona C++ function

function murmurhash2(string $s) : int {
    $len = strlen($s);
    $seed = 0;

    $m = 0x5bd1e995;
    $r = 24;

    $h1 = $seed ^ $len;
    $h2 = 0;

    $i = 0;

    while ($len >= 8) {
        $k1 = ord($s[$i++]);
        $k1 *= $m; $k1 ^= $k1 >> $r; $k1 *= $m;
        $h1 *= $m; $h1 ^= $k1;
        $len -= 4;

        $k2 = ord($s[$i++]);
        $k2 *= $m; $k2 ^= $k2 >> $r; $k2 *= $m;
        $h2 *= $m; $h2 ^= $k2;
        $len -= 4;
    }

    if ($len >= 4) {
        $k1 = ord($s[$i++]);
        $k1 *= $m; $k1 ^= $k1 >> $r; $k1 *= $m;
        $h1 *= $m; $h1 ^= $k1;
        $len -= 4;
    }

    switch ($len) {
        case 3: $h2 ^= ord($s[2]) << 16;
        case 2: $h2 ^= ord($s[1]) << 8;
        case 1: $h2 ^= ord($s[0]);
                $h2 *= $m;
    };

    $h1 ^= $h2 >> 18; $h1 *= $m;
    $h2 ^= $h1 >> 22; $h2 *= $m;
    $h1 ^= $h2 >> 17; $h1 *= $m;

    $h = $h1;

    $h = ($h << 32) | $h2;
    return $h;
}

Within MySQL I get this

select murmur_hash('Hello World'), cast(murmur_hash('Hello World')as unsigned), CONV(cast(murmur_hash('Hello World')as unsigned), 10, 16);
-- -8846466548632298438 9600277525077253178 853B098B6B655C3A

And in PHP I get

var_dump(murmurhash2('Hello World'));
// int(5969224437940092928)

So looking at the MySQL and PHP results, neither signed nor unsigned match my PHP output.

Is there something that can be fixed with either of my previous two approaches, or maybe an already working approach that I can use instead?

  • 写回答

1条回答 默认 最新

  • dongwu8653 2018-07-25 21:43
    关注

    I've solved this myself by essentially porting the Percona hashing function directly to a PHP extension MySQL.

    Installation and usage instructions are posted here https://github.com/StirlingMarketingGroup/php-murmur-hash


    Example output

    In MySQL, the Percona extension is used like

    select`murmur_hash`('Yeet')
    -- -7850704420789372250
    

    And in PHP

    php -r 'echo murmur_hash("Yeet");'
    // -7850704420789372250
    

    Note that those are getting treated as signed integers for both environments, which you can solve in MySQL by using cast(`murmur_hash`('Yeet')as unsigned), but PHP doesn't support unsigned integers.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)
  • ¥20 matlab yalmip kkt 双层优化问题
  • ¥15 如何在3D高斯飞溅的渲染的场景中获得一个可控的旋转物体
  • ¥88 实在没有想法,需要个思路
  • ¥15 MATLAB报错输入参数太多
  • ¥15 python中合并修改日期相同的CSV文件并按照修改日期的名字命名文件
  • ¥15 有赏,i卡绘世画不出
  • ¥15 如何用stata画出文献中常见的安慰剂检验图
  • ¥15 c语言链表结构体数据插入