dpkt17803 2014-08-09 21:36
浏览 87
已采纳

用PHP中的htmlspecialchars()替换除了某些html标签以外的所有标签?

I would like to process my user input to allow only certain html tags, and replace the other ones by their html entities, as well as replace non-tag-characters. For example, if I only wanted to allow the <b> and the <a> tag, then

allow_only("This is <b>bold</b> and this is <i>italic</i>.
            Moreover 2<3 and <a href='google.com'>this is a link</a>.","<b><a>");

should produce

This is <b>bold</b> and this is &lt;i&gt;italic&lt;/i&gt;.
Moreover 2&lt;3 and <a href='google.com'>this is a link</a>.

How can I do this in PHP? I am aware of strip_tags() that can remove the unwanted tags completely, and I'm aware of htmlspecialchars() which can replace all tags by their html entities, but none where only specific tags get replaced. How can this be done in PHP?

And if there is no 'common' way to do this, how should I in general go on processing user input that can have valid regular html, but can also have < signs and potentially dangerous html code?

  • 写回答

1条回答 默认 最新

  • dongxunhua2054 2014-08-09 21:57
    关注

    Apply htmlspecialchars and then replace encoded entities with regular entities for a given array of tags

    function allow_only($str, $allowed){
        $str = htmlspecialchars($str);
        foreach( $allowed as $a ){
            $str = str_replace("&lt;".$a."&gt;", "<".$a.">", $str);
            $str = str_replace("&lt;/".$a."&gt;", "</".$a.">", $str);
        }
        return $str;
    }
    echo allow_only("This is <b>bold</b> and this is <i>italic</i>.", array("b"));
    

    That works for simple tags, returning "This is bold and this is <i>italic</i>."

    As it was pointed out, that doesn't work for tags with attributes, but this does:

    function fix_attributes($match){
        return "<".$match[1].str_replace('&quot;','"',$match[2]).">";
    }
    function allow_only($str, $allowed){
        $str = htmlspecialchars($str);
        foreach( $allowed as $a ){
            $str = preg_replace_callback("/&lt;(".$a."){1}([\s\/\.\w=&;:#]*?)&gt;/", fix_attributes, $str);
            $str = str_replace("&lt;/".$a."&gt;", "</".$a.">", $str);
        }
        return $str;
    }
    echo allow_only('This is <b>bold</b> and <a href="http://www.#links">this</a> is <i>italic</i>.', array("b","a"));
    

    that handles more complex tags with certain attributes, only the characters listed between [] are allowed to appear in attributes by this. Unfortunately &quot; must be allowed within attributes or it won't work, and with it all other entities are allowed too - however only &quot in attributes will be decoded.

    As it was suggested a much better (safer, cleaner) way to solve problems like this to use a library like http://htmlpurifier.org/demo.php

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥18 模拟电路问题解答有偿
  • ¥15 Matlab在app上输入带有矩阵形式的初始条件发生错误
  • ¥15 CST仿真别人的模型结果仿真结果S参数完全不对
  • ¥15 误删注册表文件致win10无法开启
  • ¥15 请问在阿里云服务器中怎么利用数据库制作网站
  • ¥60 ESP32怎么烧录自启动程序
  • ¥50 html2canvas超出滚动条不显示
  • ¥15 java业务性能问题求解(sql,业务设计相关)
  • ¥15 52810 尾椎c三个a 写蓝牙地址
  • ¥15 elmos524.33 eeprom的读写问题