dqg95034 2011-08-13 22:06
浏览 91
已采纳

PHP将一串html拆分为一个类名为tag的数组

I need to take a string of html text like:

<p>This is a line with no spans<br>
This is a line <span class="second">This is secondary</span><br>  
This is another line <span class="third">And this is third</span> <span class="four">this is four</span></p>

And have it end up as an array in PHP like:

array(
    "This is a line with no spans",
    array(
      "This is a line",
      second => "This is secondary",
    ),
    array(
      "This is another line",
      third => "And this is third",
      four => "this is four"
    )
);

Getting each line into it's own value was easy, I just split the text on <br> and that works fine, but getting lines to be split with the class name I can't quite get. I feel like php's preg_split may hold the key, but I kind of suck with regular expressions and I can't get it figured out.

Any ideas?

  • 写回答

3条回答 默认 最新

  • doujia7517 2011-08-13 22:42
    关注

    It's not a good idea to use regular expressions to parse HTML (cite). It's just not a suitable tool; see @JAAulde's answer.

    The best way is to do it purely with the DOM. Loop through all child nodes (including text nodes) to format the array the way you want. Like this:

    $p = // get paragraph tag...
    $lines = array();
    $pChildren = $p->childNodes;
    for ($i = 0; $i < $pChildren->length; $i++) {
        $line = array();
        $child = $pChildren->item($i);
        if ($child instanceof DOMText) {
            $line[] = $child->wholeText;
        } elseif ($child instanceof DOMElement) {
            if (strtolower($child->tagName) == 'br') {
                $lines[] = $line;
                $line = array();
            } elseif (strtolower($child->tagName) == 'span' && $child->hasAttribute('class')) {
                $line[$child->getAttribute('class')] = $child->nodeValue;
            }
        }
    }
    

    Warning: treat the above as pseudo-code, it has not been tested at all, just going from experience and the manual.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥17 pro*C预编译“闪回查询”报错SCN不能识别
  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向