IMAP messages have a UID
for which we all rejoice. However, I'm trying to figure out how to generate a unique ID for a POP3 message and having trouble (old systems like hotmail.com only allow POP3).
Available messages to the client are fixed when a POP session opens the maildrop, and are identified by message-number local to that session or, optionally, by a unique identifier assigned to the message by the POP server. This unique identifier is permanent and unique to the maildrop and allows a client to access the same message in different POP sessions. Mail is retrieved and marked for deletion by message-number. When the client exits the session, the mail marked for deletion is removed from the maildrop. - wikipedia
It seems however, that the basic LIST
command simply returns an array of temp numbers to allow you to fetch the email. Those numbers are in no way unique though so another extension called UIDL seems to have been added: CAPA (POP3 Extension Mechanism).
POP3 states that a UIDL
is unique as long as the message exists.
The unique-id of a message is an arbitrary server-determined string, consisting of one to 70 characters in the range 0x21 to 0x7E, which uniquely identifies a message within a maildrop and which persists across sessions. This persistence is required even if a session ends without entering the UPDATE state. The server should never reuse an unique-id in a given maildrop, for as long as the entity using the unique-id exists.
Note that messages marked as deleted are not listed.
While it is generally preferable for server implementations to store arbitrarily assigned unique-ids in the maildrop, this specification is intended to permit unique-ids to be calculated as a hash of the message. Clients should be able to handle a situation where two identical copies of a message in a maildrop have the same unique-id.
Which makes me think that it's possible that I might download another message a year later (after the first one was deleted) which has the same UIDL and might clash in my system.
Should I just hash the whole message body and use that as an ID?
Rather than fetching the whole email to hash it, perhaps I should just use TOP [id] 1
to hash the headers (and first line) which shouldn't ever match an existing email since the receiving server will always add some type of information correct? So an attacker could never cause a collision since the received or something should have been modified right?
The MDaemon program seems to tackle the issue with partial header hashing:
MDaemon constructs the UIDL results using the message name, date stamp, size, and a few other details about the messages. As a result, if a message is modified on the server, it will appear as “new” to mail clients even if you don’t rename it.
What is the proper way to make an ID for a POP3 email?
Note: Emails often contain a Message-ID
header - but I can't rely on that because it could be used as an attack vector to confuse my system. It also is left-out by some email clients.