dpql57753 2009-05-14 17:26
浏览 66
已采纳

HTML Purifier选择性地吃特殊字符

Using PHP against a UTF-8 compliant database. Here's how input goes in.

  1. user types input into textarea
  2. textarea encoded with javascript escape()
  3. passed via HTTP post
  4. decoded with PHP rawurldecode()
  5. passed through HTMLPurifier with default settings
  6. escaped for MySQL and stored in database

And it comes out in the usual way and I run unescape() on page load. This is to allow people to, say, copy and paste directly from a word document and have the smart quotes show up.

But HTMLPurifier seems to be clobbering non-UTF-8 special characters, ones that escape() to a simple % expression, like Ö, which escapes to %D6, whereas smartquotes escape to %u2024 or something and go into the database that way. It takes out both the special character and the one immediately following.

I need to change something in this process. Perhaps I need to change multiple things.

What can I do to not get special characters clobbered?

  • 写回答

1条回答 默认 最新

  • dsa456369 2009-05-14 17:53
    关注
    1. textarea encoded with javascript escape()

    escape isn't safe for non-ascii. Use escapeURIComponent

    1. passed via HTTP post

    I assume that you use XmlHttpRequest? If not, make sure that the page containing the form is served as utf-8.

    1. decoded with PHP rawurldecode()

    If you access the value through $_POST, you should not decode it, since that has already been done. Doing so will mess up data.

    1. escaped for MySQL and stored in database

    Make sure you don't have magic quotes turned on. Make sure that the database stores tables as utf-8 (The encoding and the collation must be both utf-8). Make sure that the connection between php and MySql is utf-8 (Use set names utf8, if you don't use PDO).

    Finally, make sure that the page is served as utf-8 when you output the string again.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 孟德尔随机化结果不一致
  • ¥15 深度学习残差模块模型
  • ¥50 怎么判断同步时序逻辑电路和异步时序逻辑电路
  • ¥15 差动电流二次谐波的含量Matlab计算
  • ¥15 Can/caned 总线错误问题,错误显示控制器要发1,结果总线检测到0
  • ¥15 C#如何调用串口数据
  • ¥15 MATLAB与单片机串口通信
  • ¥15 L76k模块的GPS的使用
  • ¥15 请帮我看一看数电项目如何设计
  • ¥23 (标签-bug|关键词-密码错误加密)