dtwzwmv87399 2018-07-25 18:35
浏览 221
已采纳

如何在PHP 7.2中生成64位Murmur哈希v2?

I've got a MySQL database that has some Murmur2 hashes (as unsigned 64bit ints) that were generated with the Percona UDF that comes with the Percona strand of MySQL database found here https://github.com/percona/build-test/blob/master/plugin/percona-udf/murmur_udf.cc

My problem is that now I need to generate these same hashes on the PHP side, but I can't seem to find or tweak anything existing to work/output the same output for the same input.

Things I've tried:

  1. Copying the C++ function from the Percona UDF into my forked version of this PHP extension that originally produced 32bit int hashes https://github.com/StirlingMarketingGroup/php_murmurhash. This almost worked, as in it compiled, but when I execute the function within PHP the apache server crashes with a segfault and I'm not familiar enough with C++ and PHP extensions to debug this

The segfault gets caused by me running this function

var_dump(murmurhash('Hello World'));

Which works fine normally when I downloaded https://github.com/kibae/php_murmurhash (the original, 32bit, hash producing extension) and followed the instructions, but once I replaced the function (Only edit in the MurmurHash2.cpp file to https://github.com/StirlingMarketingGroup/php_murmurhash/blob/master/MurmurHash2.cpp) the same function call crashes the PHP script.

  1. Trying to port the Percona UDF C++ function to PHP. I'm not super sure if my PHP function is 100% accurate with trying to account for the pointer incrementing but I suspect more so that the reason I get entirely different output with the PHP version has something to do with PHP not supporting unsigned integers.

Here is the PHP function that I've written as a port from the Percona C++ function

function murmurhash2(string $s) : int {
    $len = strlen($s);
    $seed = 0;

    $m = 0x5bd1e995;
    $r = 24;

    $h1 = $seed ^ $len;
    $h2 = 0;

    $i = 0;

    while ($len >= 8) {
        $k1 = ord($s[$i++]);
        $k1 *= $m; $k1 ^= $k1 >> $r; $k1 *= $m;
        $h1 *= $m; $h1 ^= $k1;
        $len -= 4;

        $k2 = ord($s[$i++]);
        $k2 *= $m; $k2 ^= $k2 >> $r; $k2 *= $m;
        $h2 *= $m; $h2 ^= $k2;
        $len -= 4;
    }

    if ($len >= 4) {
        $k1 = ord($s[$i++]);
        $k1 *= $m; $k1 ^= $k1 >> $r; $k1 *= $m;
        $h1 *= $m; $h1 ^= $k1;
        $len -= 4;
    }

    switch ($len) {
        case 3: $h2 ^= ord($s[2]) << 16;
        case 2: $h2 ^= ord($s[1]) << 8;
        case 1: $h2 ^= ord($s[0]);
                $h2 *= $m;
    };

    $h1 ^= $h2 >> 18; $h1 *= $m;
    $h2 ^= $h1 >> 22; $h2 *= $m;
    $h1 ^= $h2 >> 17; $h1 *= $m;

    $h = $h1;

    $h = ($h << 32) | $h2;
    return $h;
}

Within MySQL I get this

select murmur_hash('Hello World'), cast(murmur_hash('Hello World')as unsigned), CONV(cast(murmur_hash('Hello World')as unsigned), 10, 16);
-- -8846466548632298438 9600277525077253178 853B098B6B655C3A

And in PHP I get

var_dump(murmurhash2('Hello World'));
// int(5969224437940092928)

So looking at the MySQL and PHP results, neither signed nor unsigned match my PHP output.

Is there something that can be fixed with either of my previous two approaches, or maybe an already working approach that I can use instead?

  • 写回答

1条回答 默认 最新

  • dongwu8653 2018-07-25 21:43
    关注

    I've solved this myself by essentially porting the Percona hashing function directly to a PHP extension MySQL.

    Installation and usage instructions are posted here https://github.com/StirlingMarketingGroup/php-murmur-hash


    Example output

    In MySQL, the Percona extension is used like

    select`murmur_hash`('Yeet')
    -- -7850704420789372250
    

    And in PHP

    php -r 'echo murmur_hash("Yeet");'
    // -7850704420789372250
    

    Note that those are getting treated as signed integers for both environments, which you can solve in MySQL by using cast(`murmur_hash`('Yeet')as unsigned), but PHP doesn't support unsigned integers.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 smptlib使用465端口发送邮件失败
  • ¥200 总是报错,能帮助用python实现程序实现高斯正反算吗?有偿
  • ¥15 对于squad数据集的基于bert模型的微调
  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 CST保存项目时失败
  • ¥15 树莓派5怎么用camera module 3啊
  • ¥20 java在应用程序里获取不到扬声器设备
  • ¥15 echarts动画效果的问题,请帮我添加一个动画。不要机器人回答。
  • ¥15 Attention is all you need 的代码运行