dqg95034 2011-08-13 22:06
浏览 91
已采纳

PHP将一串html拆分为一个类名为tag的数组

I need to take a string of html text like:

<p>This is a line with no spans<br>
This is a line <span class="second">This is secondary</span><br>  
This is another line <span class="third">And this is third</span> <span class="four">this is four</span></p>

And have it end up as an array in PHP like:

array(
    "This is a line with no spans",
    array(
      "This is a line",
      second => "This is secondary",
    ),
    array(
      "This is another line",
      third => "And this is third",
      four => "this is four"
    )
);

Getting each line into it's own value was easy, I just split the text on <br> and that works fine, but getting lines to be split with the class name I can't quite get. I feel like php's preg_split may hold the key, but I kind of suck with regular expressions and I can't get it figured out.

Any ideas?

  • 写回答

3条回答 默认 最新

  • doujia7517 2011-08-13 22:42
    关注

    It's not a good idea to use regular expressions to parse HTML (cite). It's just not a suitable tool; see @JAAulde's answer.

    The best way is to do it purely with the DOM. Loop through all child nodes (including text nodes) to format the array the way you want. Like this:

    $p = // get paragraph tag...
    $lines = array();
    $pChildren = $p->childNodes;
    for ($i = 0; $i < $pChildren->length; $i++) {
        $line = array();
        $child = $pChildren->item($i);
        if ($child instanceof DOMText) {
            $line[] = $child->wholeText;
        } elseif ($child instanceof DOMElement) {
            if (strtolower($child->tagName) == 'br') {
                $lines[] = $line;
                $line = array();
            } elseif (strtolower($child->tagName) == 'span' && $child->hasAttribute('class')) {
                $line[$child->getAttribute('class')] = $child->nodeValue;
            }
        }
    }
    

    Warning: treat the above as pseudo-code, it has not been tested at all, just going from experience and the manual.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100
  • ¥15 关于#hadoop#的问题