douzhuo2722 2016-12-27 01:01
浏览 76
已采纳

如何通过symfony crawler获取当前父节点之后的下一个节点?

Example HTML 5 for parsing:

<div id="orderDetails">
    <div> ... any number of blocks with unnecessary stuff ... </div>
    <div>Label for important info</div>
    <table> ... some other block type ... </table>
    <div>Some very important info here</div>
    <div> ... any number of blocks with unnecessary stuff ... </div>
</div>

My PHP code looks like this:

$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
$label = $crawler->filter('#orderDetails div:contains("Label for important info")');
$info = $label->parent()->next('div');
assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');

But unfortunately crawler has no methods parent and next. But.. it has parents that gives me all parent nodes == all div's that i cannot differ.

So i have two questions in this case:

  1. How to get parent of current node? Not all nodes but "actual" one!
  2. How to traverse dom horizontally with some analogue of next/prev?

Thanks.

  • 写回答

1条回答 默认 最新

  • dongqing5575 2016-12-27 18:03
    关注

    Story

    After some digging into source code, i've found that method nextAll() returns not "all" but just "one" node ($node = $this->getNode(0);).

    That means if i need "two nodes after current", then i must write $node->nextAll()->nextAll()->nextAll().

    WTF?! This is super strange naming convention (0_0).

    Answers

    1. How to get parent of current node? Not all nodes but "actual" one!
    // This is only one parent node
    $parent = $node->parents();
    
    1. How to traverse dom horizontally with some analogue of next/prev?
    // This is only one node – next after current
    $next = $node->nextAll();
    // This is only one node – previous before current
    $prev = $node->nextAll();
    // This is only one node – next after two from current
    $nextAfterTwo = $node->nextAll()->nextAll()->nextAll();
    

    Concrete code solution

    So, as needed implementation really exists, function-solution to question looks like this:

    /**
     * Returns sibling node that is after current and filtered with selector
     *
     * @param Crawler $start    Node from which start traverse
     * @param string  $selector CSS/XPath selector like in `Crawler::filter($selector)`
     *
     * @return Crawler Found node wrapped with Crawler
     *
     * @throws \InvalidArgumentException When node not found
     */
    function getNextFiltered(Crawler $start, string $selector) : Crawler
    {
        $count = $start->parents()->count();
        $next  = $start->nextAll();
        while ($count --> 0) {
            $filtered = $next->filter($selector);
            if ($filtered->count()) return $filtered;
            $next = $next->nextAll();
        }
    
        throw new \InvalidArgumentException('No node found');
    }
    

    And in my example:

    $crawler = new Crawler($html);
    $label   = $crawler->filter('#orderDetails div:contains("Label for important info")');
    $info    = getNextFiltered($label, 'div');
    assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?