The title pretty much says it all, but I'll try to flesh the issue a bit.
A PHP application of mine needs to read e-mails from a socket (this was a requirement) and then use some of those e-mails (having an api token) as articles in the application (it's a cms).
I've been able to get the reading part kind of going, but now we're stuck in parsing them; concretely our issue is that an e-mail I might receive will 99% of the time look like this:
MIME-Version: 1.0
Received: by {ip_number} with {protocol}; {iso_date}
Date: {iso_date}
Delivered-To: {destination}
Message-ID: {sample_message_id}
Subject: {subject}
From: {sender}
To: {destination}
Content-Type: multipart/mixed; boundary={sample_boundary}
--{sample_boundary}
Content-Type: multipart/alternative; boundary={sample_boundary_2}
--{sample_boundary_2}
Content-Type: text/plain; charset={charset}
{file_content}
--
{signature}
--{sample_boundary_2}
Content-Type: text/html; charset={charset}
{content_html}
{signature_html}
--{sample_boundary_2}--
--{sample_boundary}
Content-Type: image/jpeg; name="{file_name}"
Content-Disposition: attachment; filename="{file_name}"
Content-Transfer-Encoding: base64
X-Attachment-Id: {sample_attachment_id}
{quoted_printable_file_contents}
--{sample_boundary}--
And while I've been trying to regex them out I simply haven't been able to. The fact that standard e-mails should end their lines in
but some do in
combined with the nesting thing is too much for me to handle.
There's a library in PHPClasses that splits e-mails into MIME parts (along with a bunch of other things), written by some Manuel Lemos guy who clearly knew what he was doing since it's really efficient and returns nicely formatted and parsed, but it doesn't cut it for me.
The library itself consists of +2500 lines of unintelligible gibberish I can't make any sense of (it being written in 3 different camelCases and using assorted indentation styles along with different types of ifs (like if():
and if()
and if(){}
and loops like for(;;)
, for(){}
and for():
does not make it much simpler)
Could anyone please give me a hand here?
Thank you very much!
-- Edited to add
Following Sjoern's advice I started building a solution to my own question (thanks!!). I'm still open to more suggestions though; surely there's better ways of doing it)
class MimePartsParser{
protected function hasContentType($string){
return strtolower(trim(substr($string,0,14))) == 'content-type';
}
protected function hasTransferEncoding($string){
return strpos($string, 'Content-Transfer-Encoding')!==false;
}
protected function getBoundary($from){
preg_match('/boundary="(?P<boundary>(.*))"/', $from, $matches);
if(isset($matches['boundary']) AND count($matches['boundary']>0)){
return $matches['boundary'];
}
}
protected function cleanMimePart($msg){
$msg = trim($msg);
return trim(substr(trim($msg),0,strlen(trim($msg))-3));
}
protected function parseMessage($msg){
$parts = array();
if($boundary = $this->getBoundary($msg)){
$msgs = explode($boundary, $msg);
foreach($msgs as $msg){
if($msg = $this->parseMessage($msg)){
$parts []= $msg;
}
}
}
else{
if($this->hasContentType($msg) AND $this->hasTransferEncoding($msg)){
$parts []= $this->cleanMimePart($msg);
}
}
return $parts;
}
protected function flattenArray($array){
$flat = array();
foreach(new RecursiveIteratorIterator(new RecursiveArrayIterator($array)) as $key => $item){
$flat []= $item;
}
return $flat;
}
public function parse($string){
return $this->flattenArray($this->parseMessage($string));
}
}
/*Usage example*/
$mimeParser = new MimePartsParser;
var_dump($mimeParser->parse(file_get_contents('sample.txt')));