dongtuwu8548 2014-04-03 11:40
浏览 44
已采纳

PHP中的htmlspecialchars完全在XML输出中省略了数据

I have a MySQL query that returns data for formatting to an XML file. One of the columns is a free text field that can contain strange characters that "breaks" the XML with an encoding error. I believe these characters are a strange " quotes that made it into a record from pasted Microsoft Word when the user originally input the record. I do not have control over that process.

Strange Character example:

“TURN KEY – Totally Furnished†

I am using htmlspecialchars to "clean" this data and it basically removes the field entirely from XML record and makes it blank for that record. This fixes the encoding issue but that record is now missing data for that field. I still want that data, I just want to omit or even change weird characters to something like a dash.

$description  = htmlspecialchars($row['PropertyInformation'], ENT_QUOTES, 'UTF-8');

The XML output ends up like this in the records where the weird characters are occurring:

<DESCRIPTIF>
<![CDATA[ ]]>
</DESCRIPTIF>
  • 写回答

3条回答 默认 最新

  • dongmin4052 2014-04-06 11:11
    关注

    The htmlspecialchars function returns an empty string if the input string contains an invalid code unit sequence within the given encoding, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

    The ENT_IGNORE flag silently discards invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it may have security implications.

    The ENT_SUBSTITUTE falg replaces invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.

    You could try to set one of these flags.

    htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
  • dongqiangse6623 2014-04-03 11:47
    关注

    Looks like you forgot to capitalize utf-8

    $description = htmlspecialchars($row['PropertyInformation'], ENT_QUOTES, 'UTF-8');

    评论
  • duancheng3342 2014-04-07 07:21
    关注
    /**
     * Clean a string from non-printable chars
     * 
     * @param string $string
     * @return string
     */
    function str_clean($string)
    {
        return preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
    }
    
    
    $string = '“TURN KEY – Totally Furnishedâ€';
    echo htmlspecialchars(str_clean($string), ENT_QUOTES, 'UTF-8');
    
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 利用加权最小二乘法求亚马逊各类商品的价格指标?怎么求?
  • ¥15 c++ word自动化,为什么可用接口是空的?
  • ¥15 Matlab计算100000*100000的矩阵运算问题:
  • ¥50 VB6.0如何识别粘连的不规则的数字图片验证码
  • ¥16 需要完整的这份订单所有的代码,可以加钱
  • ¥30 写一个带界面控制的机房电脑一键开机关机并且实时监控的软件
  • ¥15 Stata数据分析请教
  • ¥15 请教如何为VS2022搭建 Debug|win32的openCV环境?
  • ¥15 关于#c++#的问题:c++如何使用websocketpp实现websocket接口调用,求示例代码和相关资料
  • ¥15 51单片机的外部中断,按下按键后不能切换到另一个模式