douduan1953 2011-02-17 13:42
浏览 128
已采纳

如何在没有IMAP功能的情况下从原始电子邮件中提取电子邮件附件?

The title pretty much says it all, but I'll try to flesh the issue a bit.

A PHP application of mine needs to read e-mails from a socket (this was a requirement) and then use some of those e-mails (having an api token) as articles in the application (it's a cms).

I've been able to get the reading part kind of going, but now we're stuck in parsing them; concretely our issue is that an e-mail I might receive will 99% of the time look like this:

MIME-Version: 1.0

Received: by {ip_number} with {protocol}; {iso_date}

Date: {iso_date}

Delivered-To: {destination}

Message-ID: {sample_message_id}

Subject: {subject}

From: {sender}

To: {destination}

Content-Type: multipart/mixed; boundary={sample_boundary}



--{sample_boundary}

Content-Type: multipart/alternative; boundary={sample_boundary_2}



--{sample_boundary_2}

Content-Type: text/plain; charset={charset}



{file_content}

--

{signature}



--{sample_boundary_2}

Content-Type: text/html; charset={charset}



{content_html}

{signature_html}

--{sample_boundary_2}--

--{sample_boundary}

Content-Type: image/jpeg; name="{file_name}"

Content-Disposition: attachment; filename="{file_name}"

Content-Transfer-Encoding: base64

X-Attachment-Id: {sample_attachment_id}



{quoted_printable_file_contents}

--{sample_boundary}--

And while I've been trying to regex them out I simply haven't been able to. The fact that standard e-mails should end their lines in but some do in combined with the nesting thing is too much for me to handle.

There's a library in PHPClasses that splits e-mails into MIME parts (along with a bunch of other things), written by some Manuel Lemos guy who clearly knew what he was doing since it's really efficient and returns nicely formatted and parsed, but it doesn't cut it for me.

The library itself consists of +2500 lines of unintelligible gibberish I can't make any sense of (it being written in 3 different camelCases and using assorted indentation styles along with different types of ifs (like if(): and if() and if(){}and loops like for(;;), for(){} and for(): does not make it much simpler)

Could anyone please give me a hand here?

Thank you very much!

-- Edited to add

Following Sjoern's advice I started building a solution to my own question (thanks!!). I'm still open to more suggestions though; surely there's better ways of doing it)

class MimePartsParser{  
  protected function hasContentType($string){
    return strtolower(trim(substr($string,0,14))) == 'content-type';
  }
  protected function hasTransferEncoding($string){
     return strpos($string, 'Content-Transfer-Encoding')!==false;
  }
  protected function getBoundary($from){
    preg_match('/boundary="(?P<boundary>(.*))"/', $from, $matches);
    if(isset($matches['boundary']) AND count($matches['boundary']>0)){
      return $matches['boundary'];
    }
  }
  protected function cleanMimePart($msg){
    $msg = trim($msg);
    return trim(substr(trim($msg),0,strlen(trim($msg))-3));
  }
  protected function parseMessage($msg){
    $parts = array(); 
    if($boundary = $this->getBoundary($msg)){
      $msgs = explode($boundary, $msg); 
      foreach($msgs as $msg){
        if($msg = $this->parseMessage($msg)){
          $parts []= $msg;
        }
      }
    }
    else{
      if($this->hasContentType($msg) AND $this->hasTransferEncoding($msg)){
        $parts []= $this->cleanMimePart($msg);
      }
    }
    return $parts;  
  }
  protected function flattenArray($array){
    $flat = array();
    foreach(new RecursiveIteratorIterator(new RecursiveArrayIterator($array)) as $key => $item){
      $flat []= $item;
    }
    return $flat;
  }
  public function parse($string){
    return $this->flattenArray($this->parseMessage($string));
  }
}
/*Usage example*/
$mimeParser = new MimePartsParser;
var_dump($mimeParser->parse(file_get_contents('sample.txt')));
  • 写回答

3条回答 默认 最新

  • dpyu7978 2011-02-17 13:49
    关注

    Make a function which parses a message and recursively call it.

    First, parse the whole message. If you encounter this:

    Content-Type: multipart/mixed; boundary={sample_boundary}
    

    Split the message on {sample_boundary}. Then parse each submessage.

    function parseMessage($message) {
        // Put some code here to determine the split
        $messages = explode($boundary, $message);
        $result = array();
        foreach ($messages as $message) {
            $result[] = parseMessage($message);
        }
        return $result;
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 java 操作 elasticsearch 8.1 实现 索引的重建
  • ¥15 数据可视化Python
  • ¥15 要给毕业设计添加扫码登录的功能!!有偿
  • ¥15 kafka 分区副本增加会导致消息丢失或者不可用吗?
  • ¥15 微信公众号自制会员卡没有收款渠道啊
  • ¥15 stable diffusion
  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘