简单的html dom总是加载默认的第一页而不是指定的url

I use this load the url.

$html = new simple_html_dom();
$html->load_file($url);

This loads the correct page. Then I find the next page link, here it will be: https://www.autotrader.co.uk/motorhomes/motorhome-dealers/bc-motorhomes-ayr-dpp-10004733?channel=motorhomes&page=6

Just the page value is changed from 5 to 6. The code snippet to get the next link is:

function getNextLink($_htmlTemp)
{
    //Getting the next page links
    $aNext = $_htmlTemp->find('a.next', 0);
    $nextLink = $aNext->href;    
    return $nextLink;
}

The above method returns the correct link with page value being 6. Now when I try to load this next link, it fetches the first default page with page query absent from the url.

//After loop we will have details of all the listing in this page -- so get next page link
    $nxtLink = getNextLink($originalHtml);  //Returns string url
    if(!empty($nxtLink))
    {
        //Yay, we have the next link -- load the next link        
        print 'Next Url: '.$nxtLink.'<br>'; //$nxtLink has correct value
        $originalHtml->load_file($nxtLink); //This line fetches default page
    }

The whole flow is something like this:

 $html->load_file($url);


//Whole thing in a do-while loop
$originalHtml = $html;
$shouldLoop = true;
//Main Array
$value = array();
do{
    $listings = $originalHtml->find('div.searchResult');    
    foreach($listings as $item)
    {
        //Some logic here
    }


    //After loop we will have details of all the listing in this page -- so get next page link
    $nxtLink = getNextLink($originalHtml);  //Returns string url
    if(!empty($nxtLink))
    {
        //Yay, we have the next link -- load the next link        
        print 'Next Url: '.$nxtLink.'<br>';
        $originalHtml->load_file($nxtLink);
    }
    else
    {
        //No next link -- stop the loop as we have covered all the pages
        $shouldLoop = false;
    }

} while($shouldLoop);

I have tried encoding the whole url, only the query parameters but the same result. I also tried creating new instances of simple_html_dom and then loading the file, no luck. Please help.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dream989898 2018-05-23 06:15
关注
You need to html_entity_decode those links, I can see that they are getting mangled by simple-html-dom.

$url = 'https://www.autotrader.co.uk/motorhomes/motorhome-dealers/bc-motorhomes-ayr-dpp-10004733?channel=motorhomes'; $html = str_get_html(file_get_contents($url)); while($a = $html->find('a.next', 0)){ $url = html_entity_decode($a->href); echo $url . " "; $html = str_get_html(file_get_contents($url)); }
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

悬赏问题

¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度
¥30 关于#r语言#的问题：如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
¥15 ETLCloud 处理json多层级问题
¥15 matlab中使用gurobi时报错
¥15 这个主板怎么能扩出一两个sata口
¥15 不是，这到底错哪儿了😭
¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么

简单的html dom总是加载默认的第一页而不是指定的url

1条回答 默认 最新

悬赏问题

1条回答默认最新