VARCHAR
is for strings where you don't know the length, you trade off a lesser performance for a greater flexibility. As you know the field will be 32 characters long, you should use CHAR(32)
(which expects exactly 32 characters) instead.
As for collision possibility: yes, MD5
hashes can be identical. An easy way to test this for yourself is to md5sum
a 33-character hexadecimal number. If this works, you know that there are more possible inputs than outputs, so two inputs must be able to map to the same output.
These two blocks (courtesy of this link) famously both give an md5sum
of 79054025255fb1a26e4bc422aef54eb4
:
BLOCK 1:
d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70
BLOCK 2:
d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70
This website provides a visualization into why those particular strings collide, if you're interested (it's quite technical but very interesting if you want to better understand message digest and hashing).