dpkt17803 2014-08-09 13:36
浏览 87
已采纳

用PHP中的htmlspecialchars()替换除了某些html标签以外的所有标签?

I would like to process my user input to allow only certain html tags, and replace the other ones by their html entities, as well as replace non-tag-characters. For example, if I only wanted to allow the <b> and the <a> tag, then

allow_only("This is <b>bold</b> and this is <i>italic</i>.
            Moreover 2<3 and <a href='google.com'>this is a link</a>.","<b><a>");

should produce

This is <b>bold</b> and this is &lt;i&gt;italic&lt;/i&gt;.
Moreover 2&lt;3 and <a href='google.com'>this is a link</a>.

How can I do this in PHP? I am aware of strip_tags() that can remove the unwanted tags completely, and I'm aware of htmlspecialchars() which can replace all tags by their html entities, but none where only specific tags get replaced. How can this be done in PHP?

And if there is no 'common' way to do this, how should I in general go on processing user input that can have valid regular html, but can also have < signs and potentially dangerous html code?

  • 写回答

1条回答 默认 最新

  • dongxunhua2054 2014-08-09 13:57
    关注

    Apply htmlspecialchars and then replace encoded entities with regular entities for a given array of tags

    function allow_only($str, $allowed){
        $str = htmlspecialchars($str);
        foreach( $allowed as $a ){
            $str = str_replace("&lt;".$a."&gt;", "<".$a.">", $str);
            $str = str_replace("&lt;/".$a."&gt;", "</".$a.">", $str);
        }
        return $str;
    }
    echo allow_only("This is <b>bold</b> and this is <i>italic</i>.", array("b"));
    

    That works for simple tags, returning "This is bold and this is <i>italic</i>."

    As it was pointed out, that doesn't work for tags with attributes, but this does:

    function fix_attributes($match){
        return "<".$match[1].str_replace('&quot;','"',$match[2]).">";
    }
    function allow_only($str, $allowed){
        $str = htmlspecialchars($str);
        foreach( $allowed as $a ){
            $str = preg_replace_callback("/&lt;(".$a."){1}([\s\/\.\w=&;:#]*?)&gt;/", fix_attributes, $str);
            $str = str_replace("&lt;/".$a."&gt;", "</".$a.">", $str);
        }
        return $str;
    }
    echo allow_only('This is <b>bold</b> and <a href="http://www.#links">this</a> is <i>italic</i>.', array("b","a"));
    

    that handles more complex tags with certain attributes, only the characters listed between [] are allowed to appear in attributes by this. Unfortunately &quot; must be allowed within attributes or it won't work, and with it all other entities are allowed too - however only &quot in attributes will be decoded.

    As it was suggested a much better (safer, cleaner) way to solve problems like this to use a library like http://htmlpurifier.org/demo.php

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部