简单的html dom总是加载默认的第一页而不是指定的url

I use this load the url.

$html = new simple_html_dom();
$html->load_file($url);

This loads the correct page. Then I find the next page link, here it will be: https://www.autotrader.co.uk/motorhomes/motorhome-dealers/bc-motorhomes-ayr-dpp-10004733?channel=motorhomes&page=6

Just the page value is changed from 5 to 6. The code snippet to get the next link is:

function getNextLink($_htmlTemp)
{
    //Getting the next page links
    $aNext = $_htmlTemp->find('a.next', 0);
    $nextLink = $aNext->href;    
    return $nextLink;
}

The above method returns the correct link with page value being 6. Now when I try to load this next link, it fetches the first default page with page query absent from the url.

//After loop we will have details of all the listing in this page -- so get next page link
    $nxtLink = getNextLink($originalHtml);  //Returns string url
    if(!empty($nxtLink))
    {
        //Yay, we have the next link -- load the next link        
        print 'Next Url: '.$nxtLink.'<br>'; //$nxtLink has correct value
        $originalHtml->load_file($nxtLink); //This line fetches default page
    }

The whole flow is something like this:

 $html->load_file($url);


//Whole thing in a do-while loop
$originalHtml = $html;
$shouldLoop = true;
//Main Array
$value = array();
do{
    $listings = $originalHtml->find('div.searchResult');    
    foreach($listings as $item)
    {
        //Some logic here
    }


    //After loop we will have details of all the listing in this page -- so get next page link
    $nxtLink = getNextLink($originalHtml);  //Returns string url
    if(!empty($nxtLink))
    {
        //Yay, we have the next link -- load the next link        
        print 'Next Url: '.$nxtLink.'<br>';
        $originalHtml->load_file($nxtLink);
    }
    else
    {
        //No next link -- stop the loop as we have covered all the pages
        $shouldLoop = false;
    }

} while($shouldLoop);

I have tried encoding the whole url, only the query parameters but the same result. I also tried creating new instances of simple_html_dom and then loading the file, no luck. Please help.

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dream989898 2018-05-22 22:15
关注
You need to html_entity_decode those links, I can see that they are getting mangled by simple-html-dom.

$url = 'https://www.autotrader.co.uk/motorhomes/motorhome-dealers/bc-motorhomes-ayr-dpp-10004733?channel=motorhomes'; $html = str_get_html(file_get_contents($url)); while($a = $html->find('a.next', 0)){ $url = html_entity_decode($a->href); echo $url . " "; $html = str_get_html(file_get_contents($url)); }
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

编辑

预览

报告相同问题？

关注问题

element Tabs 标签控件能不能默认全部加载，而不是点击的时候再加载。 elementui
2021-12-21 02:33

回答 1 已采纳你别放在el-tab-pane里不就可以了吗？放在外部根据activeName控制显隐，你的iframe有一个问题就是初始化dom初始化渲染可能会有宽高的问题
使用append添加dom，为什么添加进去成字符串了而不是html解析，各位给分析一下原因 html javascript 前端
2022-03-25 09:06

回答 3 已采纳 $('.bottom-center').eq($('.bottom-center').length - 1).append(yejiao)
PHP简单HTML DOM解析器在有效URL上返回false html5 php
2017-04-22 09:00

回答 4 已采纳 It looks like HTML DOM parser is failing because the HTML file size is greater than the library's
el tab 默认每次打开页面都显示第一个tab_前端都该懂的浏览器工作原理
2020-12-23 10:07

曹舟力的博客浏览器渲染过程https://segmentfault.com/a/1190000022633988在我们面试过程中，面试官经常会问到这么一个问题，那就是从在浏览器地址栏中输入URL到页面显示，浏览器到底发生了什么？这个问题看起来是老生常谈，但是...
PHP简单HTML DOM解析器从项目编号2而不是编号1开始 php
2012-04-23 10:24

回答 1 已采纳 echo is out of the foreach, so it will only output the last item. Move it inside foreach.
使用PHP获取加载的操作HTML dom php
2018-08-17 13:57

回答 1 已采纳 You can't. At least, not with PHP alone. The PHP DOM extension does not include a Javascript inter
简单的HTML DOM解析不起作用 html php
2016-03-11 15:28

回答 3 已采纳 You mix Simple HTML Dom third part class commands (as per your question title) with DOMDocument bu
前端面试系列-输入url后全过程&&页面渲染机制&&DOM生成过程
2021-03-27 07:10

LYFlied的博客一、当输入url后，全过程浏览器缓存 (DNS解析)，解析获取相应的IP地址。 cp连接，三次握手。浏览器向服务器发送http请求，请求数据包。将数据返回至浏览器浏览器收到HTTP响应读取页面内容，浏览器渲染，解析...
PHP简单HTML DOM - 如何获取标记内的文本 html php
2016-04-02 01:04

回答 1 已采纳 try: innertext() innertext used for Read or write the inner HTML text of element. foreach($ht
react-hook使用useref获取dom的style，第一次触发事件获取为空 react.js 前端
2022-04-14 03:19

回答 1 已采纳 ref获取到的是内联的样式引入的样式获取不到
简单的HTML dom - 在另一个上面找到一个元素 php
2016-03-13 07:38

回答 1 已采纳 If above HTML sample is the content of <div class="blockfix">, you can retrieve correspondin
前端学习之浏览器从输入URL到页面加载的全过程
2022-03-08 14:47

shakalaca的博客浏览器从输入URL到页面加载的全过程从输入URL到页面加载的主干流程如下： 1、浏览器的地址栏输入URL并按下回车。 2、浏览器查找当前URL的DNS缓存记录。 3、DNS解析URL对应的IP。 4、根据IP建立TCP连接（三次握手）...
html加载图片有超时时间吗,[前端]图片预加载方法
2021-06-12 19:37

showtime911的博客使用jQuery图片预加载(延迟加载)插件Lazy LoadLazy Load也叫惰性加载，延迟加载，顾名思义，就是在图片未到达可视区域时，不加载图片，我们常常在很多的优秀网站上看到类似的例子，例如迅雷、土豆、优酷等，由于一个...
前端性能优化——如何提高页面加载速度？
2022-01-11 12:27

忘川...的博客 1.将样式表放在头部首先说明一下，将样式表放在头部对于实际页面加载的时间并不能造成太大影响...这源自浏览器的行为：如果样式表仍在加载，构建呈现树就是一种浪费，因为所有样式表加载解析完毕之前务虚会之任何东西
前端性能优化全攻略：提升用户体验，加速页面加载
2024-10-31 07:33

魏大帅。的博客本文深入探讨前端性能优化，涵盖减少 HTTP 请求、压缩资源、使用缓存、优化 JavaScript 和 CSS 性能等多方面方法，结合示例代码详细阐述如何减少页面加载时间，提升用户体验，是前端开发者必备的性能优化指南。
没有解决我的问题, 去提问

悬赏问题

¥15 PADS Logic 原理图
¥15 PADS Logic 图标
¥15 电脑和power bi环境都是英文如何将日期层次结构转换成英文
¥20 气象站点数据求取中~
¥15 如何获取APP内弹出的网址链接
¥15 wifi 图标不见了不知道怎么办上不了网变成小地球了

简单的html dom总是加载默认的第一页而不是指定的url

1条回答 默认 最新

悬赏问题

1条回答默认最新