dongmou5628 2016-01-07 16:27
浏览 38
已采纳

PHP preg_split通过<br>,<br/>,<p>输入到单独的段落中

I am curling from a page with very ill-formed code. There is a particular snippet of the page I am trying to parse into paragraphs. This input snippet may be divided by <p> and </p> or separated by one or more <br> or <br/> tags. In cases where there are two <br> tags after another, I don't want those to be two separate pargaraphs.

My current code I'm trying to parse/display with is

$paragraphs = preg_split('/(<\s*p\s*\/?>)|(<\s*br\s*\/?>)|(\s\s+)|(<\s*\/p\s*\/?>)/', $article, -1, PREG_SPLIT_NO_EMPTY);
$paragraphcount = count($paragraphs);
for($x = 1; $x <= $paragraphcount; $x++ )
    {
    echo "<p>".$paragraphs[$x-1]."</p>";
    }

However, this is not working as expected. Some different inputs/outputs are as follows:

Input 1: first part </p> <p> second part </p> <p> third part </p> <p> fourth part <br/>

Output 1: <p>first part </p><p> </p><p>second part </p><p> </p><p> third part </p><p> </p><p>fourth part</p><p> </p>

My code is parsing the input into paragraphs; however, it's also adding extra paragraphs containing only a space.

Any help would be appreciated.

Input is UTF-8 if it makes a difference.

  • 写回答

2条回答 默认 最新

  • duanaoou4105 2016-01-07 17:57
    关注

    Here is a solution with preg_replace:

    $article = "first part </p> <p> second part </p> <p> third part </p> 
                <p> fourth part <br/> <br> fifth part";
    $healed = substr(
              preg_replace('/(\s*<(\/?p|br)\s*\/?>\s*)+/u', "</p><p>", "<p>$article<p>"),
              4, -3);
    

    It first wraps the string in <p> and then replaces (repetitions of) the variants of breaks by </p><p>, to finally remove the starting </p> and ending <p>. Note that this does not produce an (intermediate) array, but the final string.

    echo $healed;
    

    outputs:

    <p>first part</p><p>second part</p><p>third part</p><p>fourth part</p><p>fifth part</p>
    

    Note that you need the u modifier at the end of the regular expression to get UTF-8 support.

    If on the other hand you need the paragraphs in an array, then preg_split is better suited (using the same regular expression):

    $paragraphs = preg_split('/(\s*<(\/?p|br)\s*\/?>\s*)+/u',
                             $article, null, PREG_SPLIT_NO_EMPTY);
    

    If you then write:

    foreach ($paragraphs as $paragraph) {
        echo "$paragraph
    ";
    }
    

    You get:

    first part
    second part
    third part
    fourth part
    fifth part
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 被蓝屏搞吐了,有偿求帮解答,Ai回复直接拉黑
  • ¥15 BP神经网络控制倒立摆
  • ¥20 要这个数学建模编程的代码 并且能完整允许出来结果 完整的过程和数据的结果
  • ¥15 html5+css和javascript有人可以帮吗?图片要怎么插入代码里面啊
  • ¥30 Unity接入微信SDK 无法开启摄像头
  • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
  • ¥20 cad图纸,chx-3六轴码垛机器人
  • ¥15 移动摄像头专网需要解vlan
  • ¥20 access多表提取相同字段数据并合并
  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角