duanniu4106 2012-11-19 21:30
浏览 54
已采纳

如何从谷歌/雅虎图像搜索等获取更多链接(html-simple-dom)

I can get the source code of the search result page. So my question is about how to get MORE. For google, it only shows the first 20 image results in the source code I get, for Yahoo it's about 50. Because in both cases real people need to scroll down the page to see more search result.

Question: Is there anyway the script can do the "scroll down" for me so I can get more results?

The code I'm using:

require_once('simple_html_dom.php');
$url = "https://www.google.com/search?tbm=isch&q=cool+image";
$html = file_get_html($url);
foreach($html->find('img') as $element) {


    $image_url = $element->src; 

    echo $image_url, "<br />";}
  • 写回答

1条回答 默认 最新

  • doucong3048 2012-12-09 06:48
    关注

    I'll answer my own question. - -|||

    Google actually keeps the old version. To use that version, first search something, then scroll to the bottom and click "Switch to basic version".

    Now only 20 images are displayed on each page and the url contains page parameters.

    Because it's displaying 20 images each page, the second page's url has the parameter:

    start=20
    

    and the third page will be

    start=40
    

    This parameter: sout=1 is needed in the url to tell google you want the basic version.

    To conclude, the simplest google image search url with page number would be:

    $url = "https://www.google.com/search?tbm=isch&sout=1&start=" . ($pageNum -1)*20. "&q="  . $key_word ;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么