dreamact3026 2012-03-21 21:33
浏览 134
已采纳

用于对数组进行指纹识别的最快方法(从数据数组计算唯一哈希)

I am using a lot of caching and buffering of API calls in my WWW Framework and one of the things that I end up using all around is 'fingerprinting' data in order to match cache filenames as well as detecting API calls that have already been made.

A lot of data is moved in arrays, like GET, POST and so on. As a result the uniqueness of an API call depends on the data.

As a result I need to fingerprint this information. To do it requires generating a 'fingerprint' from the data array as well and hashing it into a string that I can store and compare against.

For array serialization there is serialize() and json_encode() in PHP. After various benchmarks I consider json_encode() the faster method for serializing an array and am quite happy with it.

For hashing there is md5() and sha1() functions, of which md5() is faster according to my benchmarks.

So my current fingerprint algorithm is:

$fingerprint=md5(json_encode($array));

But I am having doubts whether this is the 'fastest possible' method for fingerprinting an array in PHP. I have tried Google and StackOverflow and have not found good alternatives though. Am I on a right track or do I need to do something different?

  • 写回答

3条回答 默认 最新

  • duangou1551 2012-03-22 16:20
    关注

    Once you've got your array json_encoded, you should probably go with a non-cyrptographic hash function if you're primarily concerned with speed. Different hash functions are good for different things. MD5 and Sha1 are called cryptographic because they are hard to reverse (note they are widely considered deprecated for security purposes due to vulnerabilities). CRC (cyclic redundancy check) functions are error detecting codes and would be ill-suited for uniqueness anyways.

    Wikipedia is a decent place to start for this, if only because contributions there generally have external links to library implementations: List of hash functions. I would recommend reading up on a few of the non-cryptographic libraries there and benchmarking them. The non-cryptographic functions are more written for speed and reasonable degrees of uniqueness, sacrificing security, error detection and other interesting properties, which from your description is exactly what you want.

    One final note to consider if you're mainly concerned about speed is how you are going to store and compare the fingerprints themselves. MD5 outputs 128 bits of data, which won't fit into a numeric type in php without some extra library calls and overhead. For my money, I would bet that you could get the best speed of comparison and storage would come from an hash function that can output 64 bit numbers directly. Note that to get 64 numbers natively in php, you need to have 64 bit hardware and have php configured/installed in 64 bit mode. I have some code around here somewhere I used to test our staging and prod environments I could probably dig up if you're interested.

    Btw, I don't think you're going to get any faster stringification of an array than json-encode. The heart of that problem is array walking and string manipulation, so essentially the speed is proportional to the verbosity of the output. JSON-encode is very terse compared to php's serialize or export functions. I bet if you looked through enough comments on the php documentation pages, you could find someone who wrote a hash function that takes an array as an input directly, but it would be a gamble whether it was any good at all.

    Feel free to ask questions if I was unclear on anything.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器