douzhuo2722 2016-12-27 01:01
浏览 76
已采纳

如何通过symfony crawler获取当前父节点之后的下一个节点?

Example HTML 5 for parsing:

<div id="orderDetails">
    <div> ... any number of blocks with unnecessary stuff ... </div>
    <div>Label for important info</div>
    <table> ... some other block type ... </table>
    <div>Some very important info here</div>
    <div> ... any number of blocks with unnecessary stuff ... </div>
</div>

My PHP code looks like this:

$crawler = new \Symfony\Component\DomCrawler\Crawler($html);
$label = $crawler->filter('#orderDetails div:contains("Label for important info")');
$info = $label->parent()->next('div');
assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');

But unfortunately crawler has no methods parent and next. But.. it has parents that gives me all parent nodes == all div's that i cannot differ.

So i have two questions in this case:

  1. How to get parent of current node? Not all nodes but "actual" one!
  2. How to traverse dom horizontally with some analogue of next/prev?

Thanks.

  • 写回答

1条回答 默认 最新

  • dongqing5575 2016-12-27 18:03
    关注

    Story

    After some digging into source code, i've found that method nextAll() returns not "all" but just "one" node ($node = $this->getNode(0);).

    That means if i need "two nodes after current", then i must write $node->nextAll()->nextAll()->nextAll().

    WTF?! This is super strange naming convention (0_0).

    Answers

    1. How to get parent of current node? Not all nodes but "actual" one!
    // This is only one parent node
    $parent = $node->parents();
    
    1. How to traverse dom horizontally with some analogue of next/prev?
    // This is only one node – next after current
    $next = $node->nextAll();
    // This is only one node – previous before current
    $prev = $node->nextAll();
    // This is only one node – next after two from current
    $nextAfterTwo = $node->nextAll()->nextAll()->nextAll();
    

    Concrete code solution

    So, as needed implementation really exists, function-solution to question looks like this:

    /**
     * Returns sibling node that is after current and filtered with selector
     *
     * @param Crawler $start    Node from which start traverse
     * @param string  $selector CSS/XPath selector like in `Crawler::filter($selector)`
     *
     * @return Crawler Found node wrapped with Crawler
     *
     * @throws \InvalidArgumentException When node not found
     */
    function getNextFiltered(Crawler $start, string $selector) : Crawler
    {
        $count = $start->parents()->count();
        $next  = $start->nextAll();
        while ($count --> 0) {
            $filtered = $next->filter($selector);
            if ($filtered->count()) return $filtered;
            $next = $next->nextAll();
        }
    
        throw new \InvalidArgumentException('No node found');
    }
    

    And in my example:

    $crawler = new Crawler($html);
    $label   = $crawler->filter('#orderDetails div:contains("Label for important info")');
    $info    = getNextFiltered($label, 'div');
    assert('Some very important info here' === $info->text(), 'Important info must be grabbed from HTML');
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么