duanjianfu1398 2018-05-19 20:25
浏览 49
已采纳

使用XPath访问子段落内容

HTML:

<div class="b-list-fact__item-explanation js-fact-explanation">
    <p>Text 1 Text 1 Text 1 Text 1 Text 1 Text 1</p>
    <p>Text 2 Text 2 Text 2 Text 2 Text 2 Text 2 </p>
</div>

I'm trying to access the text inside paragraphs and to combine all p's into one string.

Was trying with a bunch of variations like:

PHP (running on 7.1.11):

    $html = file_get_contents('https://...');
    $html = mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8');
    $dom = new DOMDocument;
    @$dom->loadHTML($html);

    $finder = new DomXPath($dom);
    $facts = $finder->query("//a[contains(@class, normalize-space('b-list-fact__item-text'))]");
    $long_fact = $finder->query("//*[contains(@class, 'b-list-fact__item-explanation js-fact-explanation')]/p");

    foreach ($facts as $key => $fact) {
            $fact_description = $long_fact[$key]->textContent;
            $fact = trim($fact->textContent);
            $dataArr[] = str_replace("
", " ", $fact);
            array_push($dataArr, $fact_description);
    }

$long_fact = $finder->query("//*[contains(@class, 'b-list-fact__item-explanation js-fact-explanation')]/p");

$long_fact = $finder->query("//*[contains(@class, 'b-list-fact__item-explanation js-fact-explanation')]//p[1]");

$long_fact = $finder->query("//*[contains(@class, 'b-list-fact__item-explanation js-fact-explanation')]/p/text()");

if($long_fact->length)
        {
            var_dump($long_fact[0]->textContent);
        }

if($$long_fact->length)
        {
            var_dump($long_fact->textContent);
        }

if($$long_fact->length)
        {
            var_dump($long_fact->nodeValue);
        }

And like 30 other variations...

I'm totally lost as to why this can happen, other variations which don't include p tags are working just fine.

  • 写回答

1条回答 默认 最新

  • dqed19166 2018-05-19 21:24
    关注
    $ptext = $finder->query('//div[contains(@class, "b-list-fact__item-explanation js-fact-explanation")]/p');
    $paragraphs = [];
    foreach ($ptext as $paragraph) {
        $paragraphs[] = $paragraph->textContent;
    }
    $combined = implode("
    ", $paragraphs);
    

    Alternatively just:

    $ptext = $finder->query('//div[contains(@class, "b-list-fact__item-explanation js-fact-explanation")]')
        ->item(0)->textContent;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料