douti8321 2013-07-15 02:47 采纳率: 0%
浏览 71
已采纳

PHP:如何使用HTML Purifier使用nl2br()来保持换行符?

Issue: When using HTML Purifier to process user-inputted content, line-breaks are not being translated into <br /> tags.

Consider the following user-inputted content:

Lorem ipsum dolor sit amet.
This is another line.

<pre>
.my-css-class {
    color: blue;
}
</pre>

Lorem ipsum:

<ul>
<li>Lorem</li>
<li>Ipsum</li>
<li>Dolor</li>
</ul>

Dolor sit amet,
MyName

When processed using HTML Purifier, the above is being altered to the following:

Lorem ipsum dolor sit amet. This is another line.

.my-css-class {
    color: blue;  
} 

Lorem ipsum:

  • Lorem
  • Ipsum
  • Dolor
Dolor sit amet, MyName

As you can see, "MyName" which was intended to be on a separate line by the user, is being displayed altogether with the previous line.

How to fix?

Using the PHP nl2br() function, of course. However, new issues arise whether we use it before or after purifying the content.

Here is an example when using nl2br() before HTML Purifier:

Lorem ipsum dolor sit amet.
This is another line.

.my-css-class {

    color: blue; 

} 

Lorem ipsum:

  • Lorem
  • Ipsum
  • Dolor

Dolor sit amet,
MyName

What happens is that nl2br() adds <br /> for each line-break, therefore even the ones in the <pre> block are being processed, as well as the line-breaks after each <li> tag.

What I tried

I tried a custom nl2br() function which replaces line-breaks with <br /> tags, and then removes all <br /> tags from <pre> blocks. It works great, however the issue remains for the <li> items.

Trying the same approach for <ul> blocks would also remove all <br /> tags from the <li> children, unless we would use a more complex regex to remove <br /> tags that are inside <ul> elements but outside <li> elements. But then what about nested <ul> within a <li> item? To handle all those situations we'd have to have an even more complex regex!

  • If this is the right approach, could you help me out with the regex?
  • If it's not the right approach, how could I solve this problem? I am also open to alternatives to HTML Purifier.

Other resources that I've already looked at:

  • 写回答

2条回答 默认 最新

  • dongrong1856 2013-08-05 18:26
    关注

    This issue can be solved partially (if not completely) with a custom nl2br() function:

    function nl2br_special($string){
    
        // Step 1: Add <br /> tags for each line-break
        $string = nl2br($string); 
    
        // Step 2: Remove the actual line-breaks
        $string = str_replace("
    ", "", $string);
        $string = str_replace("", "", $string);
    
        // Step 3: Restore the line-breaks that are inside <pre></pre> tags
        if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){
            foreach($match as $a){
                foreach($a as $b){
                $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string);
                }
            }
        }
    
        // Step 4: Removes extra <br /> tags
    
        // Before <pre> tags
        $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string);
        // After </pre> tags
        $string = str_replace("</pre><br /><br />", '</pre><br />', $string);
    
        // Arround <ul></ul> tags
        $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string);
        $string = str_replace("</ul><br /><br />", '</ul><br />', $string);
        // Inside <ul> </ul> tags
        $string = str_replace("<ul><br />", '<ul>', $string);
        $string = str_replace("<br /></ul>", '</ul>', $string);
    
        // Arround <ol></ol> tags
        $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string);
        $string = str_replace("</ol><br /><br />", '</ol><br />', $string);
        // Inside <ol> </ol> tags
        $string = str_replace("<ol><br />", '<ol>', $string);
        $string = str_replace("<br /></ol>", '</ol>', $string);
    
        // Arround <li></li> tags
        $string = str_replace("<br /><li>", '<li>', $string);
        $string = str_replace("</li><br />", '</li>', $string);
    
        return $string;
    }
    

    This must be applied to the content before it is HTML-Purified. Never re-process a purified content, unless you know what you're doing.

    Please note that because each line-break and double line-breaks are already kept, you should not use the AutoFormat.AutoParagraph feature of HTML Purifier:

    // Process line-breaks
    $string = nl2br_special($string);
    
    // Initiate HTML Purifier config
    $purifier_config = HTMLPurifier_Config::createDefault();
    $purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4');
    //$purifier_config->set('AutoFormat.AutoParagraph', true); // Make sure to NOT use this
    
    // Initiate HTML Purifier
    $purifier = new HTMLPurifier($purifier_config);
    
    // Purify the content!
    $string = $purifier->purify($string);
    

    That's it!


    Furthermore, because allowing basic HTML tags was originally intended to improve user experience by not adding another markup syntax, you might want to allow users to post code, and especially HTML code, which would not be interpreted/removed by HTML Purifier.

    HTML Purifier currently allows to post code but requires complex CDATA markers:

    <![CDATA[
    Place code here
    ]]>
    

    Hard to remember and to write. To simplify the user experience as much as possible I believe it is best to allow users to add code by embedding it with simple <code> (for inline code) and <pre> (for blocks of code) tags. Here is how to do that:

    function custom_code_tag_callback($code) {
    
        return '<code>'.trim(htmlspecialchars($code[1])).'</code>';
    }
    function custom_pre_tag_callback($code) {
    
        return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>';
    }
    
    // Don't require HTMLPurifier's CDATA enclosing, instead allow simple <code> or <pre> tags
    $string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string);
    $string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string);
    

    Note that like the nl2br processing, it must be done before the content is HTML Purified. Also, keep in mind that if the user puts <code> or <pre> tags in his own posted code, then it will close the parent <code> or <pre> tag enclosing his code. This cannot be solved, and also applies with the original CDATA markers or with any markup, even the one used on StackOverflow (for example using the ` symbol in a code sample will close the code tag).

    Finally, for a great user experience there are other things that we might want to automate like for example the links which we want to be made clickable. Luckily this can be done by HTML Purifier AutoFormat.Linkify feature.

    Here is the final code that includes everything for an ultimate setup:

    // === Declare functions ===
    
    function nl2br_special($string){
    
        // Step 1: Add <br /> tags for each line-break
        $string = nl2br($string); 
    
        // Step 2: Remove the actual line-breaks
        $string = str_replace("
    ", "", $string);
        $string = str_replace("", "", $string);
    
        // Step 3: Restore the line-breaks that are inside <pre></pre> tags
        if(preg_match_all('/\<pre\>(.*?)\<\/pre\>/', $string, $match)){
            foreach($match as $a){
                foreach($a as $b){
                $string = str_replace('<pre>'.$b.'</pre>', "<pre>".str_replace("<br />", PHP_EOL, $b)."</pre>", $string);
                }
            }
        }
    
        // Step 4: Removes extra <br /> tags
    
        // Before <pre> tags
        $string = str_replace("<br /><br /><br /><pre>", '<br /><br /><pre>', $string);
        // After </pre> tags
        $string = str_replace("</pre><br /><br />", '</pre><br />', $string);
    
        // Arround <ul></ul> tags
        $string = str_replace("<br /><br /><ul>", '<br /><ul>', $string);
        $string = str_replace("</ul><br /><br />", '</ul><br />', $string);
        // Inside <ul> </ul> tags
        $string = str_replace("<ul><br />", '<ul>', $string);
        $string = str_replace("<br /></ul>", '</ul>', $string);
    
        // Arround <ol></ol> tags
        $string = str_replace("<br /><br /><ol>", '<br /><ol>', $string);
        $string = str_replace("</ol><br /><br />", '</ol><br />', $string);
        // Inside <ol> </ol> tags
        $string = str_replace("<ol><br />", '<ol>', $string);
        $string = str_replace("<br /></ol>", '</ol>', $string);
    
        // Arround <li></li> tags
        $string = str_replace("<br /><li>", '<li>', $string);
        $string = str_replace("</li><br />", '</li>', $string);
    
        return $string;
    }
    
    
    function custom_code_tag_callback($code) {
    
        return '<code>'.trim(htmlspecialchars($code[1])).'</code>';
    }
    
    function custom_pre_tag_callback($code) {
    
        return '<pre><code>'.trim(htmlspecialchars($code[1])).'</code></pre>';
    }
    
    
    
    // === Process user's input ===
    
    // Process line-breaks
    $string = nl2br_special($string);
    
    // Allow simple <code> or <pre> tags for posting code
    $string = preg_replace_callback("/\<code\>(.*?)\<\/code\>/is", 'custom_code_tag_callback', $string);
    $string = preg_replace_callback("/\<pre\>(.*?)\<\/pre\>/is", 'custom_pre_tag_callback', $string);
    
    
    // Initiate HTML Purifier config
    $purifier_config = HTMLPurifier_Config::createDefault();
    $purifier_config->set('HTML.Allowed', 'p,ul,ol,li,strong,b,em,i,u,a[href],code,pre,blockquote,cite,img[src|alt],br,hr,h3,h4');
    $purifier_config->set('AutoFormat.Linkify', true); // Make links clickable
    //$purifier_config->set('HTML.TargetBlank', true); // Uncomment if you want links to open new tabs
    //$purifier_config->set('AutoFormat.AutoParagraph', true); // Leave this commented as it conflicts with nl2br
    
    
    // Initiate HTML Purifier
    $purifier = new HTMLPurifier($purifier_config);
    
    // Purify the content!
    $string = $purifier->purify($string);
    

    Cheers!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 echarts动画效果失效的问题。官网下载的例子。
  • ¥60 许可证msc licensing软件报错显示已有相同版本软件,但是下一步显示无法读取日志目录。
  • ¥15 Attention is all you need 的代码运行
  • ¥15 一个服务器已经有一个系统了如果用usb再装一个系统,原来的系统会被覆盖掉吗
  • ¥15 使用esm_msa1_t12_100M_UR50S蛋白质语言模型进行零样本预测时,终端显示出了sequence handled的进度条,但是并不出结果就自动终止回到命令提示行了是怎么回事:
  • ¥15 前置放大电路与功率放大电路相连放大倍数出现问题
  • ¥30 关于<main>标签页面跳转的问题
  • ¥80 部署运行web自动化项目
  • ¥15 腾讯云如何建立同一个项目中物模型之间的联系
  • ¥30 VMware 云桌面水印如何添加