drh78568 2015-05-16 09:48
浏览 50
已采纳

以编程方式进行谷歌搜索并处理结果

I want to make a search on google, using php or node.js... I not yet decided that, it depends about what answer for this question is easier to implement (the rest of what I want to do is easy in both languages).

After make this consultation I want to process the result, get the links, the number of results (only with the number of results could be great)...

The searching is for a url image.

Any suggestion??

  • 写回答

2条回答 默认 最新

  • duandao3265 2015-05-18 02:16
    关注

    Google has implemented lots of safeguards to ensure that it's search engine can't be scraped. However, Google must still work, that's the whole point. So the best way to do google scraping I've found so far is to control a real web browser.

    There's Selenium if you want to go that route. However, I prefer my programs to be self-contained than needing to depend on an installed web browser (I run most of my programs on headless servers). So I prefer using phantomjs which is a full webkit based browser (like Safari and Konqueror) driven javascript.

    Phantomjs scripts tend to be verbose however so most people use it with a wrapper such as casperjs, node-horseman or nightmarejs (there are lots more, search npm).

    Here's an example of google scraping from node-horseman web page:

    var Horseman = require('node-horseman');
    var horseman = new Horseman();
    
    var numLinks = horseman
      .open('http://www.google.com')
      .type('input[name="q"]', 'github')
      .click("button:contains('Google Search')")
      .waitForNextPage()
      .count("li.g");
    
    console.log("Number of links: " + numLinks);
    
    horseman.close();
    

    If you know how to inspect a page with the developer tools, you'll know how to write a scraper using phantomjs.


    One word of warning. Don't download google search too frequently otherwise google will probably detect your script as a bot and temporarily ban you. Make sure you wait an appropriate amount of time between searches.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?