dongyu1983 2015-04-29 20:45
浏览 58
已采纳

在电子邮件中编码的随机HTML字符

I'm generating an email with PHP that outputs an HTML table. Most of the table comes through fine, but some of the < and > characters are randomly encoded to &lt; and &gt;. It doesn't always do it in the same place. Sometimes it just happens in one place, sometimes not at all, and sometimes in multiple places.

Here's a code snippet from the middle of my table as my email client sees it. Note the inserted &lt; /tr&gt; that should not be there:

<tr>  
  <td>SERVER_SOFTWARE</td>
  <td>Apache/2.2.29 (Red Hat)</td>
</tr>
<tr>
  <td>SERVER_PROTOCOL</td>
  <td>HTTP/1.1</td>
  &lt; /tr&gt;
</tr>
<tr>
  <td>REQUEST_METHOD</td>
  <td>POST</td>
</tr>

And the same segment in the plaintext part of the email: (again, note that </tr> somehow gets inserted.)

SERVER_SOFTWARE Apache/2.2.29 (Red Hat)
SERVER_PROTOCOL HTTP/1.1 < /tr>
REQUEST_METHOD POST

I'm setting it to UTF-8 in the headers before sending:

$headers  = "MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable";

(P.S. I was having the exact same problem earlier using charset=ISO-8859-1.)

But despite this, it is somehow being displayed in US-ASCII:

Content-type: text/html;
    charset="US-ASCII"
Content-transfer-encoding: quoted-printable

The PHP script that's generating the email looks like this:

//generate $table
$indicesServer = array('PHP_SELF', 'argv', 'argc', 'GATEWAY_INTERFACE', 'SERVER_ADDR', 'SERVER_NAME', 'SERVER_SOFTWARE', 'SERVER_PROTOCOL', 'REQUEST_METHOD', 'REQUEST_TIME', 'REQUEST_TIME_FLOAT', 'QUERY_STRING', 'DOCUMENT_ROOT', 'HTTP_ACCEPT', 'HTTP_ACCEPT_CHARSET', 'HTTP_ACCEPT_ENCODING', 'HTTP_ACCEPT_LANGUAGE', 'HTTP_CONNECTION', 'HTTP_HOST', 'HTTP_REFERER', 'HTTP_USER_AGENT', 'HTTPS', 'REMOTE_ADDR', 'REMOTE_HOST', 'REMOTE_PORT', 'REMOTE_USER', 'REDIRECT_REMOTE_USER', 'SCRIPT_FILENAME', 'SERVER_ADMIN', 'SERVER_PORT', 'SERVER_SIGNATURE', 'PATH_TRANSLATED', 'SCRIPT_NAME', 'REQUEST_URI', 'PHP_AUTH_DIGEST', 'PHP_AUTH_USER', 'PHP_AUTH_PW', 'AUTH_TYPE', 'PATH_INFO', 'ORIG_PATH_INFO') ;
$table = '<table cellpadding="3" cellspacing="0" border="1" bordercolor="#bbb">';
foreach ($indicesServer as $arg) {
    if (isset($_SERVER[$arg])) {
        $table .= '<tr><td>'.$arg.'</td><td>' . $_SERVER[$arg] . '</td></tr>' ;
    } else {
        $table .= '<tr><td>'.$arg.'</td><td>-</td></tr>' ;
    }
}
$table .=  '</table>' ;

//set up email
$to = [redacted];
$subject = [redacted];
$email_body = "Heres data:" . $table;
$headers  = "MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable";

//send email
mail($to, $subject, $email_body, $headers);

EDIT: I've noticed HTML attributes are getting messed up. It's related to the quoted-printable encoding of equals signs. = is encoded to =3D as expected, but then sometimes the next character is deleted! Thus the following is happening:

<a href="http://example.com"> becomes <a href=3D"ttp://example.com">

<table cellpadding=3 cellspacing=0 border=1> becomes <table cellpadding<ellspacingorder=3D"&lt;tr">

  • 写回答

2条回答 默认 最新

  • dongsi3826 2015-04-29 20:47
    关注

    My guess is since that's a closing "tr" that shouldn't be there (you have another right after it), some friendly html parser is "helping" you by changing from being a tag into some normal string.

    Another thought:

    See here: https://support.sendgrid.com/hc/en-us/articles/200182068-HTML-Formatting-Issues

    1. Some mail clients, such as Outlook and Thunderbird, appear to insert double spacing line breaks at every line. The reason is that the 'content-transfer-encoding' in MIME is set to 'quoted-printable' which adds Carriage Return Line Feed (CRLF) line breaks to the source content of the email which are characters interpreted by these mail clients. To alleviate this problem, please do the following:

    a. If you can customize the MIME settings for your email, set the 'Content-Transfer-Encoding' to '7bit' instead of 'Quoted-Printable.'

    b. Ensure that your content follows the line length limits from item 2 above.

    I wonder if something is putting a line break in your tag, causing it to be unreadable, then the browser is adding an extra as a replacement.

    Can you try this: change 'Content-Transfer-Encoding' to '7bit' or leave it out entirely?

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大
  • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
  • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
  • ¥15 (标签-MATLAB|关键词-多址)
  • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
  • ¥500 52810做蓝牙接受端
  • ¥15 基于PLC的三轴机械手程序