douxiza9868 2015-10-28 09:16
浏览 42
已采纳

浏览亚马逊畅销书页面

<?php

    $i=1;
    while ($i<=5) {
      # code...

      $url = 'http://www.amazon.in/gp/bestsellers/electronics/ref=zg_bs_nav_0#'.$i;
      echo $url;
            $html= file_get_contents($url);
            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xPath = new DOMXPath($dom);
            $classname="zg_title";
            $elements = $xPath->query("//*[contains(@class, '$classname')]");
                foreach ($elements as $e)
              {
                $lnk = $e->getAttribute('href');
                $e->setAttribute("href", "http://www.amazon.in".$lnk);
                $newdoc = new DOMDocument;
                $e = $newdoc->importNode($e, true);
                $newdoc->appendChild($e);
                $html = $newdoc->saveHTML();
                echo $html;
            }
            $i++;
           }
?>

I am trying to crawl through the Amazon bestsellers page which has a list of top 100 bestseller items which have 20 items in each page. In every loop the $i value is changed and appended to URL. But only the first 20 items are being displayed 5 times, I think this has something to do with the ajax pagination, but i am not able to figure out what it is.

  • 写回答

1条回答 默认 最新

  • dq05304 2015-10-28 09:26
    关注

    Try this:

    <?php
    
        $i=1;
        while ($i<=5) {
          # code...
            $url = 'http://www.amazon.in/gp/bestsellers/electronics/ref=zg_bs_electronics_pg_'.$i.'?ie=UTF8&pg='.$i;
          echo $url;
                $html= file_get_contents($url);
                $dom = new DOMDocument();
                @$dom->loadHTML($html);
                $xPath = new DOMXPath($dom);
                $classname="zg_title";
                $elements = $xPath->query("//*[contains(@class, '$classname')]");
                    foreach ($elements as $e)
                  {
                    $lnk = $e->getAttribute('href');
                    $e->setAttribute("href", "http://www.amazon.in".$lnk);
                    $newdoc = new DOMDocument;
                    $e = $newdoc->importNode($e, true);
                    $newdoc->appendChild($e);
                    $html = $newdoc->saveHTML();
                    echo $html;
                }
                $i++;
               }
    ?>
    

    Change your $url

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 opencv图像处理,需要四个处理结果图
  • ¥15 无线移动边缘计算系统中的系统模型
  • ¥15 深度学习中的画图问题
  • ¥15 java报错:使用mybatis plus查询一个只返回一条数据的sql,却报错返回了1000多条
  • ¥15 Python报错怎么解决
  • ¥15 simulink如何调用DLL文件
  • ¥15 关于用pyqt6的项目开发该怎么把前段后端和业务层分离
  • ¥30 线性代数的问题,我真的忘了线代的知识了
  • ¥15 有谁能够把华为matebook e 高通骁龙850刷成安卓系统,或者安装安卓系统
  • ¥188 需要修改一个工具,懂得汇编的人来。