Scenario
I am getting content from a website uing PHP, DOMDOCUMENT and XPATH. My code makes sure the HTML content is UTF-8 and tries to remove certain nodes that match a query.
Part of code where the issue lies
Inside a PHP class:
libxml_use_internal_errors(true);
$this->dom=new DOMDocument("4.01", "utf-8");
$xpath=new DOMXPath($this->dom);
$this->motorConfig['xPath_N']="//div[@class='pdfprnt-bottom-right']/following-sibling::*";
$content_text_dirty='
... aleba</p><div class="pdfprnt-bottom-right">Y entonces...</div><div><p> ...
';
if($this->motorConfig['xPath_N']){
$content_text_dirty=str_replace("\0", '', $content_text_dirty); //Avoid PHP BUG http://stackoverflow.com/questions/30925533/php-dom-loadhtml-method-unusual-warning
$this->dom->loadHTML(mb_convert_encoding($content_text_dirty, 'HTML-ENTITIES', "UTF-8"), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath=new DOMXPath($this->dom); //her cuz must be set after loading HTML into DOM
$nodes_to_remove=$xpath->query($this->motorConfig['xPath_N']);
var_dump($nodes_to_remove); --> bool(false)
...
Question:
What is a good way to know WHY xpath query is not finding the results?
Extra notes
As a curiosity, PHP is not showing any result for any query when I remove the part:
str_replace("\0", '', $content_text_dirty);
I have been using this PHP class for a long while, to scrape data from different websites. But this only happens sometimes, in some specific websites. The current case regards this site. [Actually, trying the same xpath query with FirePath returns the match]