使用xpath从网页刮取特定文本

I've searched and tried multiple ways to get this but I'm not sure why it won't find most of the information on the webpage.

Page to scrape: https://m.safeguardproperties.com/

Info needed: Version number for PhotoDirect for Apple (currently 4.4.0)

Xpath to text needed (I think) : /html/body/div[1]/div[2]/div[1]/div[4]/div[3]/a

Attempts:

<?php

$file = "https://m.safeguardproperties.com/";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

$xpath = new DOMXpath($doc);

$elements = $xpath->query("/html/body/div[1]/div[2]/div[1]/div[4]/div[3]/a");

echo "<PRE>";

if (!is_null($elements)) {
  foreach ($elements as $element) {
      var_dump ($element);
    echo "<br/>[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "
";
    }
  }
}

echo "</PRE>";

?>

Second Attempt:

<?PHP
$file = "https://m.safeguardproperties.com/";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);

echo '<pre>';

  // trying to find all links in document to see if I can see the correct one
  $links = [];
  $arr = $doc->getElementsByTagName("a");

  foreach($arr as $item) { 
    $href =  $item->getAttribute("href");
    $text = trim(preg_replace("/[
]+/", " ", $item->nodeValue));
    $links[] = [
      'href' => $href,
      'text' => $text
    ];
  }

var_dump($links);
echo '</pre>';
?>

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
donglin4636 2017-10-04 22:18
关注
For that particular website, the versions are being loaded from JSON data client side, you won't find them in the base document.

http://m.safeguardproperties.com/js/photodirect.json

This was located by comparing the original document source to the finished DOM and inspecting the network activity in the developer console.

$url = 'https://m.safeguardproperties.com/js/photodirect.json'; $json = file_get_contents( $url ); $object = json_decode( $json ); echo $object->ios->version; //4.4.0

Please respect other websites and cache your GET request.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

xpath提取不到 text 文本 python 有问必答
2021-07-19 17:03

回答 4 已采纳选取其所在标签，然后用text属性获取其下所有文本值。 txt='''<div class='item'> <span class="p1">制片国家/地区:</span
用xpath爬取文本时如何去掉非文本内容 python 爬虫
2021-12-18 14:35

回答 1 已采纳 discribe =html.xpath('normalize-space(//div[@class="container-fluid"]//div[@class="work_b"]//text()
使用xpath从background-image样式属性中提取值 php
2017-11-01 05:47

回答 1 已采纳 1) You lost quotes wrapping xpath - it's string. 2) with dom xpath, query returns set of nodes w
XPath 基本定位方法：标签、属性、文本等
2023-07-12 22:19

挣扎的蓝藻的博客 XPath 是一种功能强大的查询语言，用于在 XML... XPath 提供了多种基本定位方法，包括标签定位、属性定位和文本定位等。本篇博客将深入探讨 XPath 的基本定位方法，通过详细的解释和实例演示来展示它们的特点和灵活性。
XPath使用特定文本查找某个节点 html php
2012-12-23 16:21

回答 2 已采纳 Just use //*[text() = "29"] instead of contains().
python如何用xpath取两个标签之间的内容 python 爬虫
2022-05-18 17:43

回答 3 已采纳 //span[text()="名字:"]/following::a[position()<count(//span[text()="名字:"]/following::a)-count(//spa
如何在php中使用curl xpath在网站上获取特定图片 php
2017-04-28 22:04

回答 1 已采纳 Assuming you want the image the appears next to the first headline, the XPath is: function news($
XPath 文本内容的模糊匹配：灵活筛选和定位元素的高级技巧
2023-07-13 18:43

挣扎的蓝藻的博客文本内容的模糊匹配是 XPath 的一项高级技巧，它允许我们使用通配符、正则表达式和特定函数来实现模糊匹配的筛选和定位。本篇博客将深入探讨 XPath 文本内容的模糊匹配技巧，通过详细的解释和实例演示，展示它在元素...
C#，使用xpath 无法采集网页内容 c#
2019-07-14 10:23

回答 2 已采纳使用浏览器得到的xpath 中有一条为//SPAN[@class='p_tatime']，该路径无法采集到时间信息，改为//span[@class='p_tatime']_可正确采集时间信息。
使用Xpath进行部分匹配 php xml
2019-04-18 17:12

回答 2 已采纳 A few errors here: use of and instead of or, assuming searches are case-insensitive, and passing i
使用特定文本PHP / XPath获取DOMElement php
2011-12-31 17:42

回答 2 已采纳 Use: (//*[text() [contains(., 'My String')] ] )[1] This selects the first element in
parsel：Parsel使您可以使用XPath或CSS选择器从XMLHTML文档中提取数据
2021-02-04 18:51

在Parsel中，你可以使用XPath表达式来精确地定位你需要的数据，无论是某个特定的标签、属性值，还是嵌套结构中的内容。例如，`//title` 会选取所有的`<title>`元素，而 `@href` 则能帮助你获取所有链接的`href`属性...
webscraping-gana:网页刮取彩票结果-乌斯别特人结果网页刮取
2021-03-07 04:25

网页刮取，也称为Web抓取或数据抓取，是一种技术，通过自动化程序从网站上收集大量信息。在这个特定的项目“webscraping-gana”中，重点是使用JavaScript来获取彩票结果，特别是乌斯别特人的彩票结果。这个项目可能...
浏览器中XPath的使用
2023-10-10 18:40

DTcode7的博客 XPath定位在爬虫和自动化测试中都比较常用，通过使用路径表达式来选取 XML 文档中的节点或者节点集，熟练掌握XPath可以极大提高提取数据的效率。因为XPath解析数据，是基于元素（Element）的树形结构，所以学习XPath...
XPath 使用数值定位元素
2023-07-13 19:18

挣扎的蓝藻的博客除了基本的标签、属性和文本匹配外， XPath 还提供了数值定位的功能，可以根据元素的数值属性进行精确的筛选和定位。这种数值定位的技巧非常实用，可以在数据处理和分析中发挥重要作用。本篇博客将深入探讨 XPath ...
没有解决我的问题, 去提问

悬赏问题

¥20 MATLAB仿真三相桥式全控整流电路
¥15 EDA技术关于时序电路设计
¥15 百度文心一言流式返回sse失败
¥15 由于远程方已关闭传输流，身份验证失败
¥15 rt-detr，PCB，目标检测
¥15 有偿求指导实证代码。cfps清洗合并后，无论是构建平衡面板还是非平衡面板，都是只剩几百个样本量。求指导一下哪里出问题了，不要潦草回复
¥15 mutlinichenet
¥50 Qt5.14.2怎样使用qlistwidget存储指针类数据并更新？
¥15 STM32多路复用器ADC采样
¥20 Linux（ubuntu）操作相关问题

使用xpath从网页刮取特定文本

1条回答 默认 最新

悬赏问题

1条回答默认最新