dryift6733 2015-12-22 19:09
浏览 142

即使存在节点,XPath查询也会返回false

Scenario

I am getting content from a website uing PHP, DOMDOCUMENT and XPATH. My code makes sure the HTML content is UTF-8 and tries to remove certain nodes that match a query.

Part of code where the issue lies

Inside a PHP class:

libxml_use_internal_errors(true);
$this->dom=new DOMDocument("4.01", "utf-8");
$xpath=new DOMXPath($this->dom);
$this->motorConfig['xPath_N']="//div[@class='pdfprnt-bottom-right']/following-sibling::*";
$content_text_dirty='
... aleba</p><div class="pdfprnt-bottom-right">Y entonces...</div><div><p> ...
';

if($this->motorConfig['xPath_N']){
$content_text_dirty=str_replace("\0", '', $content_text_dirty); //Avoid PHP BUG http://stackoverflow.com/questions/30925533/php-dom-loadhtml-method-unusual-warning
$this->dom->loadHTML(mb_convert_encoding($content_text_dirty, 'HTML-ENTITIES', "UTF-8"), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath=new DOMXPath($this->dom); //her cuz must be set after loading HTML into DOM
$nodes_to_remove=$xpath->query($this->motorConfig['xPath_N']);
var_dump($nodes_to_remove); --> bool(false)
...

Question:

What is a good way to know WHY xpath query is not finding the results?

Extra notes

As a curiosity, PHP is not showing any result for any query when I remove the part:

str_replace("\0", '', $content_text_dirty);

I have been using this PHP class for a long while, to scrape data from different websites. But this only happens sometimes, in some specific websites. The current case regards this site. [Actually, trying the same xpath query with FirePath returns the match]

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 用windows做服务的同志有吗
    • ¥60 求一个简单的网页(标签-安全|关键词-上传)
    • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
    • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
    • ¥100 为什么这个恒流源电路不能恒流?
    • ¥15 有偿求跨组件数据流路径图
    • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
    • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
    • ¥15 一直显示正在等待HID—ISP
    • ¥15 Python turtle 画图