dtcuv8044 2018-05-20 10:36
浏览 36
已采纳

使用XPath将dom内容正确分类到Array中

Example HTML:

<div class"classX">
<a href="#" class="aClass">Link Text 1</a>
<span class="sClass"><p>Text #1</p></span>
</div>

<div class="classX">
<a href="#" class="aClass">Link Text 2</a>
</div>

<div class="classX">
<a href="#" class="aClass">Link Text 3</a>
</div>

<div class="classX">
<a href="#" class="aClass">Link Text 4</a>
<span class="sClass"><p>Text #4</p></span>
</div>

<div class="classX">
<a href="#" class="aClass">Link Text 5</a>
<span class="sClass"><p>Text #5</p></span>
</div>

I'm trying to build an array that will look like:

 [0] => Array
        (
            [link_text] => Link Text 1
            [span_text] => Text #1
        )

    [1] => Array
        (
            [link_text] => Link Text 2
        )

    [2] => Array
        (
            [link_text] => Link Text 3
        )

    [3] => Array
        (
            [link_text] => Link Text 4
            [span_text] => Text #4
        )

    [4] => Array
        (
            [link_text] => Link Text 5
            [span_text] => Text #5
        )

But using a foreach loop with a $key value organizes the output incorrectly and instead, I get an array that looks like this:

 [0] => Array
        (
            [link_text] => Link Text 1
            [span_text] => Text #1
        )

    [1] => Array
        (
            [link_text] => Link Text 2
            [span_text] => Text #4
        )

    [2] => Array
        (
            [link_text] => Link Text 3
            [span_text] => Text #5
        )

    [3] => Array
        (
            [link_text] => Link Text 4
        )

    [4] => Array
        (
            [link_text] => Link Text 5
        )

I fully understand why this happens, that's because I'm using link_text key when accessing the span_text value but I have no idea how to properly build an array with a correct combination.

PHP:

$finder = new DomXPath($dom);
$link_texts= $finder->query("//a[contains(@class, normalize-space('aClass'))]");
$span_text= $finder->query("//span[contains(@class,'sClass')]/@data-html");


foreach ($link_texts as $key => $link_text) {

    if (empty($span_text[$key]->textContent)) {
        $link_text = trim($link_text->textContent);
        $dataArr[] = str_replace("
", " ", $link_text);
        $data[] = array("link_text"=>str_replace("
", " ", $link_text));
    } else {
        $span_text = str_replace("
", " ", $span_text[$key]->textContent);
        $span_text = preg_replace('~</?p[^>]*>~', '', $span_text);
        $link_text = trim($link_text->textContent);
        $data[] = array("link_text"=>str_replace("
", " ", $link_text), "span_text"=>$span_text);
    }

}
  • 写回答

1条回答 默认 最新

  • dowaw80220 2018-05-20 11:24
    关注

    I think it would be easier to start by selecting all the parent <div class"classX"> elements. Then we can select the nested a and span elements for each div.

    $finder = new DomXPath($dom);
    $divs = $finder->query("//div[@class='classX']");
    $data = array();
    
    foreach($divs as $div) {
        $link = $finder->query("./a[@class='aClass']", $div)->item(0);
        $span = $finder->query("./span[@class='sClass']", $div)->item(0);
        $items = array(
            "link_text" => $link ? $link->textContent : null, 
            "span_text" => $span ? $span->textContent : null
        );
        $data[] = array_filter($items);
    }
    
    print_r($data);
    

    This produces a $data array with all the link_text and span_text items in the correct order.

    Null values are removed by array_filter, so some nested arrays don't have a span_text key.
    If a constant number of items is required, then don't flter the $items array.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器