dongshuo8756 2017-04-27 11:44
浏览 141
已采纳

我只想在XPath中仅检索body元素的文本时仅排除JavaScript标记内容

I want to exclude only the JavaScript tag contents when retrieving only the text of the body element in XPath

▼index.html

<body>

  I want to acquire only "text excluding HTML tag" included in this part.

  <script language="JavaScript" type="text/javascript">
      var foo = 42;
  </script>

</body>

I have created the following code with DomCrawler. But, because it contains JavaScript tag contents, I could not get the intended results..

<?php

$crawler->filterXPath('//body')->each(function (DomCrawler $node) use ($url) {
    $result = trim($node->text());
});
  • 写回答

2条回答 默认 最新

  • dongpengyu1363 2017-04-27 11:54
    关注

    I would like to suggest you use DomXpath in which you can filter the content. by query. I am not pretty sure about Domcrawler.

    <?php
    // to retrieve selected html data, try these DomXPath examples:
    
    $file = $DOCUMENT_ROOT. "test.html";
    $doc = new DOMDocument();
    $doc->loadHTMLFile($file);
    
    $xpath = new DOMXpath($doc);
    
    // example 1: for everything with an id
    //$elements = $xpath->query("//*[@id]");
    
    // example 2: for node data in a selected id
    //$elements = $xpath->query("/html/body/script");
    
    // example 3: same as above with wildcard
    $elements = $xpath->query("*/script");
    
    if (!is_null($elements)) {
      foreach ($elements as $element) {
        echo "<br/>[". $element->nodeName. "]";
    
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
          echo $node->nodeValue. "
    ";
        }
      }
    }
    ?>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥188 需要修改一个工具,懂得汇编的人来。
  • ¥15 livecharts wpf piechart 属性
  • ¥20 数学建模,尽量用matlab回答,论文格式
  • ¥15 昨天挂载了一下u盘,然后拔了
  • ¥30 win from 窗口最大最小化,控件放大缩小,闪烁问题
  • ¥20 易康econgnition精度验证
  • ¥15 msix packaging tool打包问题
  • ¥28 微信小程序开发页面布局没问题,真机调试的时候页面布局就乱了
  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题