dsjhejw3232 2011-06-03 10:06
浏览 20
已采纳

在post中将实体转换为不允许的标记并允许标记

I have a form where an user can post a global notice into the system (for other users to see).
The system outputs HTML directly from the DB (when a user wanto to see a notice).
I'd like to allow some html tags to stay intact and to have the rest of them with htmlspecialchars() applied.
I already tried to apply

 str_replace($search, $replace, htmlspecialchars($str))

strategy but it seems to be really slow. Too slow, actually. And also it's not safe that will always work, Is there an alternative for this?
I wanted something that did the strip_tags() job except that it, instead of striping tags it would apply htmlspecialchars to the not allowed tags.

ADD(ed) info (by request):

$str can be any size you can think of. I thought of using a big string (1M characters (generated rendomly with some allowed and some unallowed tags inside. All tags had attributes) for the reason of testing one of the worst case scenarios With the logic: If it works like this, it should work for simpler cases.
The server took 5s to process the complete str_replace (with htmlspecialchars). This test was made in my computer that has 2GHz CPU and DDR3 RAM.
both $search and $replace have a total of 7 replacements. Still they do not always work. In some cases $search gives false positives or false negatives.
To clarify, I apply these changes while saving to the DB and not while retrieving from the DB.

  • 写回答

2条回答 默认 最新

  • duanchi6377 2011-06-03 10:46
    关注

    You might try this code (should be improved):

    function callback(array $matches) {
        return htmlspecialchars_decode($matches[0]);
    }
    $str = 'some <i>string</i> <b>with</b> tags '
         . '<a href="#">some link</a> '
         . '<img alt="" src="http://sstatic.net/stackoverflow/img/favicon.ico"/><hr/>';
    $str = htmlspecialchars($str);
    $str = preg_replace_callback('#(&lt;(i|a)(?: .+?)?&gt;.*?&lt;/(\1)&gt;|&lt;(?:img)(?: .*?)?/&gt;)#', 'callback', $str);
    echo $str;
    

    Regular expression looks (should look) for 2 types of strings:

    • <tag attributes>content</tag>, with tag part being the same for opening an closing tag, and attributes and content being optional
    • <tag attributes/>, with attributes being optional

    Tags are listed in (i|a) part for <tag></tag> types of tags and (?:img) for <tag/> types of tags.

    If it finds matching tags, it passes content to callback() function which converts it back by using htmlspecialchars_decode(). This is necessary for decoding quotes and other encoded characters in the list of attributes.

    I'm not sure if it works in all cases, i.e., if it matches all necessary tags. If this works in general, then pattern and callback() function should be improved so that callback() decodes only <, > characters and list of attributes; content of tags (i.e., some link part in <a href='#'>some link</a>) must not be decoded.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 关于大棚监测的pcb板设计
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器
  • ¥15 如何用Labview在myRIO上做LCD显示?(语言-开发语言)
  • ¥15 Vue3地图和异步函数使用
  • ¥15 C++ yoloV5改写遇到的问题
  • ¥20 win11修改中文用户名路径
  • ¥15 win2012磁盘空间不足,c盘正常,d盘无法写入