duanqilupinf67040
duanqilupinf67040
2010-08-24 07:28

如何在不使用文本字段的情况下在MySQL数据库中存储哈希值

已采纳

I'm storing unique user-agents in a MySQL MyISAM table so when I have to look if it exists in the table, I check the md5 hash that is stored next to the TEXT field.

User-Agents 
{
    id         - INT
    user-agent - TEXT
    hash       - VARCHAR(32) // md5
}

There is any way to do the same but using a 32-bit integer and not a text hash? Maybe the md5 in raw format will be faster? That will requiere a binary search.

[EDIT]

MySQL don't handle hash searches for complete case-sensitive strings?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

9条回答

  • douhuo0884 douhuo0884 11年前

    Let MySQL do the hard work for you. Use a CHAR column and create an index on that column. You could convert and store the hash as an integer, but there's absolutely no benefit, and it may actually cause problems.

    点赞 3 评论 复制链接分享
  • doujiabing1228 doujiabing1228 11年前

    You can't store an MD5 hash in a 32-bit int: it simply won't fit. (It's 32 characters when written in hex, but it's 128-bits of data)

    You could look at MySQL's BINARY and VARBINARY types. See http://dev.mysql.com/doc/refman/5.1/en/binary-varbinary.html. These types store binary data. In your case, BINARY(16) or VARBINARY(16), but since MD5 hashes are always 16 bytes, the latter seems a bit pointless.

    点赞 10 评论 复制链接分享
  • dongyuntao2000 dongyuntao2000 10年前

    Store the UNHEX(MD5($value)) in a BINARY(16).

    点赞 7 评论 复制链接分享
  • duane9322 duane9322 11年前

    try MurmurHash. Its a fast hashing algo thats been translated to multiple languages. It takes your input and translates it into a 32/64 bit integer hash.

    点赞 6 评论 复制链接分享
  • dpka7974 dpka7974 11年前

    You can store MD5 hash in char(32) which is a bit faster than varchar(32). It's also possible to make two BIGINT fields and keep first half of md5 hash in first field and second part in second field.

    点赞 3 评论 复制链接分享
  • dongzhiman2162 dongzhiman2162 11年前

    You could do this instead:

    User-Agents 
    {
        id         - INT
        user-agent - TEXT
        hash       - UNSIGNED INT (CRC32, indexed)
    }
    
    
    $crc32 = sprintf("%u", crc32($user_agent));
    
    SELECT * FROM user_agents WHERE hash=$crc32 AND user_agent='$user_agent';
    

    It's unlikely that you'll get collisions with crc32 for this kind of data.

    To guarantee that collisions will not cause problems, add a secondary search parameter. MySQL will be able to use the index to quickly find the record. Then it can do a simple string search to guarantee that match is correct.

    PS: The sprintf() is there to work around signed 32-bit integers. Should be unnecessary on 64-bit systems.

    点赞 1 评论 复制链接分享
  • douping3860 douping3860 11年前

    Have you tried creating a BINARY(16) field, and storing the result of md5($plaintext, true); in it? That might work, make sure you index that field as well.

    Because trying to fit a 128-bit value in 32 bits doesn't make any sense...

    点赞 评论 复制链接分享
  • duanjin9035 duanjin9035 11年前

    If your field hash is always an MD5 value generated by PHP, then you can safely set it to CHAR(32). This should not impact the response time to your queries, unless you plan to have millions+ of rows, or even worst! JOIN other tables with this field. The bottom line is that fixed width column is better than variable ones, so if you can optimize do it.

    Regarding changing MD5 into int values, see this question; the conclusion to this is that if you really want to change your MD5 into a 128-bit int value, you might as well use a random number instead of an MD5!

    点赞 评论 复制链接分享
  • doutang2017 doutang2017 11年前

    Are you REALLY sure the hashes are only 32-bit? MD5 is 128-bit. Cropping the hash to first 4 or 8 bytes would greatly increase risk of collisions.

    点赞 评论 复制链接分享

相关推荐