dpql57753 2009-05-14 17:26
浏览 66
已采纳

HTML Purifier选择性地吃特殊字符

Using PHP against a UTF-8 compliant database. Here's how input goes in.

  1. user types input into textarea
  2. textarea encoded with javascript escape()
  3. passed via HTTP post
  4. decoded with PHP rawurldecode()
  5. passed through HTMLPurifier with default settings
  6. escaped for MySQL and stored in database

And it comes out in the usual way and I run unescape() on page load. This is to allow people to, say, copy and paste directly from a word document and have the smart quotes show up.

But HTMLPurifier seems to be clobbering non-UTF-8 special characters, ones that escape() to a simple % expression, like Ö, which escapes to %D6, whereas smartquotes escape to %u2024 or something and go into the database that way. It takes out both the special character and the one immediately following.

I need to change something in this process. Perhaps I need to change multiple things.

What can I do to not get special characters clobbered?

  • 写回答

1条回答 默认 最新

  • dsa456369 2009-05-14 17:53
    关注
    1. textarea encoded with javascript escape()

    escape isn't safe for non-ascii. Use escapeURIComponent

    1. passed via HTTP post

    I assume that you use XmlHttpRequest? If not, make sure that the page containing the form is served as utf-8.

    1. decoded with PHP rawurldecode()

    If you access the value through $_POST, you should not decode it, since that has already been done. Doing so will mess up data.

    1. escaped for MySQL and stored in database

    Make sure you don't have magic quotes turned on. Make sure that the database stores tables as utf-8 (The encoding and the collation must be both utf-8). Make sure that the connection between php and MySql is utf-8 (Use set names utf8, if you don't use PDO).

    Finally, make sure that the page is served as utf-8 when you output the string again.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 merge函数占用内存过大
  • ¥15 Revit2020下载问题
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大
  • ¥15 单片机无法进入HAL_TIM_PWM_PulseFinishedCallback回调函数
  • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
  • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.