可靠地清理电子邮件正文编码

I am writing a small piece of software in php which connects to a IMAP email box and stores the messages contained therein in a MySQL DB for later processing and other goodness.

I have noticed that during testing I get some strange characters appearing in the message body when I attempt to save the message body raw. I am using imap_fetchbody() to extract the message body.

I noticed that when I use quoted_printable_decode() to clean up the message body this helps! However in doing lots of research I have also learned that this will not always help and that other methods such as utf8_encode() and base64_decode() should be used instead to clean up the message body.

So, my question is: what is the best method for reliably cleaning an email message body with php to cover all encoding scenarios?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douning5041 2013-08-23 09:35
关注
An "email body" is nowadays actually a tree of individual MIME parts. Sometimes there's just one of them, e.g. a text/plain mail. Sometimes there's a multipart/alternative which wraps inside it two "equivalent" copies of the message, one as text/plain and other as text/html. Sometimes the structure is much more complicated, with many levels of nesting. It is quite common that some of these parts are actually binary content, like images, attached ZIP files and what not.

Each of these individual MIME parts can be encoded for transport; these are specified in the Content-Transfer-Encoding header of the corresponding MIME part. The two encoding schemes which you absolutely must support to interoperate are quoted-printable and base64. An important observation is that this encoding happens separately for each part, i.e. it's perfectly legal to have a multipart/alternative with a text/plain encoded with quoted-printable and another part, text/html encoded in base64.

When you have decoded this transfer encoding, you still have to decode the text from its character encoding to Unicode, i.e. to turn the stream of bytes into Unicode text. You need to consult the encoding parameter of the Content-Type MIME header (again, the part header, not the whole-message header, unless the message itself has only one part).

All details you need to know are in RFC 2045, RFC 2046, RFC 2047 and RFC 2048 (and their corresponding updates).

FInally, there's also the interesting question on what the "main part" of an e-mail is. Suppose you have something like this:

1 multipart/mixed + 1.1 text/plain: "Hi, I'm forwarding Jeff's message..." + 1.2 message/rfc822 + 1.2.1 multipart/alternative + 1.2.1.1 text/plain "Hi coleagues, I'm sending the meeting notes from..." + 1.2.1.2 text/html "<p>Hi colleagues,..."

i.e. this happens when Fred forwards Jeff's message to you. What is the "main part" here?
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

可靠地清理电子邮件正文编码 php
2013-08-22 11:23

回答 1 已采纳 An "email body" is nowadays actually a tree of individual MIME parts. Sometimes there's just one o
根据WP编码标准清理和清理php代码 php
2019-06-23 17:37

回答 1 已采纳 I am not sure you should use wp_kses here at all. If you are the one who generates that html outpu
验证和清理电子邮件表单的最佳方法是什么？ [重复] php
2016-02-02 17:17

回答 1 已采纳 You're looking for PHP Validate Filters: if (empty($_POST["S_Email"])) { //No email address
「PHP系列」PHP E-mail 注入/防止注入
2024-04-25 00:15

雪梅零落的博客 PHP E-mail 注入是一种安全漏洞，攻击者尝试通过向邮件发送功能输入恶意数据，来操纵邮件的头部或内容，从而可能执行未授权的操作或窃取信息。尽管现代的邮件发送库和函数已经对这类攻击有所防范，但了解这种攻击的...
在echo PHP之前清理数组 php
2016-10-09 20:31

回答 3 已采纳 I feel that you could benefit from some general information on the subject of sanitization and esc
用于清理网址的PHP preg_replace php
2017-06-12 04:55

回答 1 已采纳 try this $string = "example.com/event/123/Kinda Don't Care2 Feat. Jaba/Country Music (1990)";
清理和验证表单php html mysql php
2015-11-11 11:45

回答 2 已采纳 You should use PHP regex to strictly validate your input data. Suppose you want to validate the
友邻B2B系统(PHPB2B) v4.03 GBK.rar
2019-07-09 15:59

友邻B2B系统(PHPB2B)是一款基于PHP程序和Mysql数据库、以MVC架构为基础的开源B2B行业门户电子商务网站建站系统，系统代码完整、开源，功能全面，架构优秀，提供良好的用户体验、多国语言化及管理平台，是目前搭建B2B...
需要清理垃圾邮件数据库 database mysql php
2013-01-11 02:40

回答 1 已采纳 Try this: (assuming "content" is the name of the column with the article content) UPDATE `copy` S
php：用户取消后的清理过程残留 php
2015-11-12 12:36

回答 2 已采纳 This should run everytime, even after aborting by user -> register_shutdown_function http://php
使用/结束时在php中清理网址 php
2016-02-02 21:54

回答 2 已采纳 Have it this way: Options -MultiViews RewriteEngine On RewriteBase / RewriteRule ^videos/([^/]+)
「PHP系列」PHP 发送电子邮件详解
2024-04-24 08:52

雪梅零落的博客此外，如果你想要更多的控制和灵活性，你也可以使用第三方库，如 PHPMailer 或 SwiftMailer。函数发送邮件可能会受到一些限制，如发送频率、邮件大小等。此外，由于它依赖于服务器上的邮件发送工具，因此可能会受到...
如何在php中用双引号和斜杠清理json字符串 json php
2017-08-23 11:18

回答 1 已采纳 try this $result = str_split($contacts); for ($i = 0; $i < count($result); $i++)
PHP和MySQL Web开发第4版pdf以及源码
2015-10-13 09:10

4.1 创建一个示例应用程序：智能表单邮件 4.2 字符串的格式化 4.2.1 字符串的整理：chop()、ltrim()和trim() 4.2.2 格式化字符串以便显示 4.2.3 格式化字符串以便存储：addslashes()和stripslashes() 4.3 用...
2023php后端面试题整合（最全附答案）2023/7/27更新
2020-12-09 13:48

向宇it的博客 php魔术方法常量 apache和ngiux 错误码安全验证错误等级错误日志数组递归冒泡排序快速排序composer框架区别安全攻击设计模式tcp/udpsession面向对象多继承正则swoole设计模式AJAX大流量单点登陆SSO正反向代理负载...
没有解决我的问题, 去提问

悬赏问题

¥20 有偿写代码要用特定的软件anaconda 里的jvpyter 用python3写
¥20 cad图纸，chx-3六轴码垛机器人
¥15 移动摄像头专网需要解vlan
¥20 access多表提取相同字段数据并合并
¥20 基于MSP430f5529的MPU6050驱动，求出欧拉角
¥20 Java-Oj-桌布的计算
¥15 powerbuilder中的datawindow数据整合到新的DataWindow
¥20 有人知道这种图怎么画吗？
¥15 pyqt6如何引用qrc文件加载里面的的资源
¥15 安卓JNI项目使用lua上的问题

可靠地清理电子邮件正文编码

1条回答 默认 最新

悬赏问题

1条回答默认最新