douduan1953 2011-02-17 13:42
浏览 128
已采纳

如何在没有IMAP功能的情况下从原始电子邮件中提取电子邮件附件?

The title pretty much says it all, but I'll try to flesh the issue a bit.

A PHP application of mine needs to read e-mails from a socket (this was a requirement) and then use some of those e-mails (having an api token) as articles in the application (it's a cms).

I've been able to get the reading part kind of going, but now we're stuck in parsing them; concretely our issue is that an e-mail I might receive will 99% of the time look like this:

MIME-Version: 1.0

Received: by {ip_number} with {protocol}; {iso_date}

Date: {iso_date}

Delivered-To: {destination}

Message-ID: {sample_message_id}

Subject: {subject}

From: {sender}

To: {destination}

Content-Type: multipart/mixed; boundary={sample_boundary}



--{sample_boundary}

Content-Type: multipart/alternative; boundary={sample_boundary_2}



--{sample_boundary_2}

Content-Type: text/plain; charset={charset}



{file_content}

--

{signature}



--{sample_boundary_2}

Content-Type: text/html; charset={charset}



{content_html}

{signature_html}

--{sample_boundary_2}--

--{sample_boundary}

Content-Type: image/jpeg; name="{file_name}"

Content-Disposition: attachment; filename="{file_name}"

Content-Transfer-Encoding: base64

X-Attachment-Id: {sample_attachment_id}



{quoted_printable_file_contents}

--{sample_boundary}--

And while I've been trying to regex them out I simply haven't been able to. The fact that standard e-mails should end their lines in but some do in combined with the nesting thing is too much for me to handle.

There's a library in PHPClasses that splits e-mails into MIME parts (along with a bunch of other things), written by some Manuel Lemos guy who clearly knew what he was doing since it's really efficient and returns nicely formatted and parsed, but it doesn't cut it for me.

The library itself consists of +2500 lines of unintelligible gibberish I can't make any sense of (it being written in 3 different camelCases and using assorted indentation styles along with different types of ifs (like if(): and if() and if(){}and loops like for(;;), for(){} and for(): does not make it much simpler)

Could anyone please give me a hand here?

Thank you very much!

-- Edited to add

Following Sjoern's advice I started building a solution to my own question (thanks!!). I'm still open to more suggestions though; surely there's better ways of doing it)

class MimePartsParser{  
  protected function hasContentType($string){
    return strtolower(trim(substr($string,0,14))) == 'content-type';
  }
  protected function hasTransferEncoding($string){
     return strpos($string, 'Content-Transfer-Encoding')!==false;
  }
  protected function getBoundary($from){
    preg_match('/boundary="(?P<boundary>(.*))"/', $from, $matches);
    if(isset($matches['boundary']) AND count($matches['boundary']>0)){
      return $matches['boundary'];
    }
  }
  protected function cleanMimePart($msg){
    $msg = trim($msg);
    return trim(substr(trim($msg),0,strlen(trim($msg))-3));
  }
  protected function parseMessage($msg){
    $parts = array(); 
    if($boundary = $this->getBoundary($msg)){
      $msgs = explode($boundary, $msg); 
      foreach($msgs as $msg){
        if($msg = $this->parseMessage($msg)){
          $parts []= $msg;
        }
      }
    }
    else{
      if($this->hasContentType($msg) AND $this->hasTransferEncoding($msg)){
        $parts []= $this->cleanMimePart($msg);
      }
    }
    return $parts;  
  }
  protected function flattenArray($array){
    $flat = array();
    foreach(new RecursiveIteratorIterator(new RecursiveArrayIterator($array)) as $key => $item){
      $flat []= $item;
    }
    return $flat;
  }
  public function parse($string){
    return $this->flattenArray($this->parseMessage($string));
  }
}
/*Usage example*/
$mimeParser = new MimePartsParser;
var_dump($mimeParser->parse(file_get_contents('sample.txt')));
  • 写回答

3条回答 默认 最新

  • dpyu7978 2011-02-17 13:49
    关注

    Make a function which parses a message and recursively call it.

    First, parse the whole message. If you encounter this:

    Content-Type: multipart/mixed; boundary={sample_boundary}
    

    Split the message on {sample_boundary}. Then parse each submessage.

    function parseMessage($message) {
        // Put some code here to determine the split
        $messages = explode($boundary, $message);
        $result = array();
        foreach ($messages as $message) {
            $result[] = parseMessage($message);
        }
        return $result;
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?