我只想在XPath中仅检索body元素的文本时仅排除JavaScript标记内容

I want to exclude only the JavaScript tag contents when retrieving only the text of the body element in XPath

▼index.html

<body>

  I want to acquire only "text excluding HTML tag" included in this part.

  <script language="JavaScript" type="text/javascript">
      var foo = 42;
  </script>

</body>

I have created the following code with DomCrawler. But, because it contains JavaScript tag contents, I could not get the intended results..

<?php

$crawler->filterXPath('//body')->each(function (DomCrawler $node) use ($url) {
    $result = trim($node->text());
});

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

dongpengyu1363 2017-04-27 11:54

关注

I would like to suggest you use DomXpath in which you can filter the content. by query. I am not pretty sure about Domcrawler.

<?php
// to retrieve selected html data, try these DomXPath examples:

$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

$xpath = new DOMXpath($doc);

// example 1: for everything with an id
//$elements = $xpath->query("//*[@id]");

// example 2: for node data in a selected id
//$elements = $xpath->query("/html/body/script");

// example 3: same as above with wildcard
$elements = $xpath->query("*/script");

if (!is_null($elements)) {
  foreach ($elements as $element) {
    echo "<br/>[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "
";
    }
  }
}
?>

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

我只想在XPath中仅检索body元素的文本时仅排除JavaScript标记内容 php
2017-04-27 11:44

回答 2 已采纳 I would like to suggest you use DomXpath in which you can filter the content. by query. I am not p
用xpath爬取文本时如何去掉非文本内容 python 爬虫
2021-12-18 14:35

回答 1 已采纳 discribe =html.xpath('normalize-space(//div[@class="container-fluid"]//div[@class="work_b"]//text()
在Xpath查询中排除链接 php
2018-12-23 22:25

回答 1 已采纳 You can exclude link text nodes from results with //div[@class="intro"]//text()[not(parent::a)]
xpath抓取html中的json,在谷歌Chrome插件中通过xpath获取元素值（javascript）
2021-06-11 13:09

第四张牌的博客我正在尝试使用谷歌浏览器扩展程序实现一...我在Chrome插件中有一个侦听器，它接收用户用光标指向的html元素的xpath的JSON。例如/HTML/BODY/DIV[@id='viewport']/DIV[@id='main']/DIV[@id='cnt']/DIV[6]/DIV[@id='rc...
在PHP中使用XPath替换XML属性 php xml
2019-06-11 17:26

回答 1 已采纳 The answer as Nigel Ren suggested was just to remove these two lines, as they no longer apply: $
为什么同样的xpath路径在xpath中和pycharm中显示的内容不一样 python 有问必答
2022-08-09 09:00

回答 4 已采纳第二张图明显题主是开发工具审核dom进行查看，这个并不是源代码，审核dom得到的html代码有可能被js修改过，而request之类得到的是源代码下面这种才叫源代码，src是默认的加载等待图片，实际
PHP xpath在foreach循环中按属性获取元素 php xml
2015-03-24 09:50

回答 2 已采纳 The problem is not XPath but SimpleXML. SimpleXMLElement::xpath() is limited. It converts the resu
您如何在PHP中解析和处理HTML / XML？
2019-12-04 10:40

asdfgh0077的博客还有一些很棒的功能，如您在JavaScript中看到的那样，例如“ find”功能，该功能将返回该标记名称的元素的所有实例。我已经在许多工具中使用了此工具，并在许多不同类型的网页上对其进行了测试，并且我认为它...
使用XPath和PHP存储XML文档时，标记信息不会像需要那样存储在数组中 php xml
2015-07-19 05:56

回答 2 已采纳 You're very close. The code isn't working because of the second for loop. The outer loop will it
使用DOMXPath在PHP中调用XML数据 php xml
2018-10-01 03:03

回答 1 已采纳 The problem is that there is a namespace on your VehicleDescription element. You need to register
没有文本节点后代的文档中所有元素的Xpath？ html php xml
2016-12-15 21:52

回答 1 已采纳 This XPath, //*[not(.//text())] will select all elements in the document without text node desc
python selenium 执行js_在Python Selenium WebDriver中使用JavaScript执行操作
2020-11-21 02:36

weixin_39990660的博客让我们看一下通过Python Selenium WebDriver执行...是的，我将讨论JavaScript执行器，并向您展示通过Python Selenium WebDriver执行JavaScript语句的几种不同方法。可能会发生这种情况，在某些实时项目中，Seleniu...
如何在php中使用curl xpath在网站上获取特定图片 php
2017-04-28 22:04

回答 1 已采纳 Assuming you want the image the appears next to the first headline, the XPath is: function news($
JavaScript
2022-02-07 15:01

理想和远方_在路上的博客 1. JavaScript的组成 ECMAScript（核心）：JavaScript 语言基础 DOM（文档对象模型）：规定了访问HTML和XML的接口 BOM（浏览器对象模型）：提供了浏览器窗口之间进行交互的对象和方法 2. JS的基本数据类型和引用...
Java自动化测试系列[v1.0.0][常见页面操作处理附源码]
2023-12-22 19:05

Davieyang.D.Y的博客在Web自动化过程中，有很多特殊处理，比如表单、比如控制浏览器、比如操作页面元素属性、上传文件等等
没有解决我的问题, 去提问

悬赏问题

¥188 需要修改一个工具，懂得汇编的人来。
¥15 livecharts wpf piechart 属性
¥20 数学建模，尽量用matlab回答，论文格式
¥15 昨天挂载了一下u盘，然后拔了
¥30 win from 窗口最大最小化，控件放大缩小，闪烁问题
¥20 易康econgnition精度验证
¥15 msix packaging tool打包问题
¥28 微信小程序开发页面布局没问题，真机调试的时候页面布局就乱了
¥15 python的qt5界面
¥15 无线电能传输系统MATLAB仿真问题

码龄粉丝数原力等级 --

我只想在XPath中仅检索body元素的文本时仅排除JavaScript标记内容

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

我只想在XPath中仅检索body元素的文本时仅排除JavaScript标记内容

2条回答 默认 最新

悬赏问题

2条回答默认最新