drh78568 2015-05-16 09:48
浏览 50
已采纳

以编程方式进行谷歌搜索并处理结果

I want to make a search on google, using php or node.js... I not yet decided that, it depends about what answer for this question is easier to implement (the rest of what I want to do is easy in both languages).

After make this consultation I want to process the result, get the links, the number of results (only with the number of results could be great)...

The searching is for a url image.

Any suggestion??

  • 写回答

2条回答 默认 最新

  • duandao3265 2015-05-18 02:16
    关注

    Google has implemented lots of safeguards to ensure that it's search engine can't be scraped. However, Google must still work, that's the whole point. So the best way to do google scraping I've found so far is to control a real web browser.

    There's Selenium if you want to go that route. However, I prefer my programs to be self-contained than needing to depend on an installed web browser (I run most of my programs on headless servers). So I prefer using phantomjs which is a full webkit based browser (like Safari and Konqueror) driven javascript.

    Phantomjs scripts tend to be verbose however so most people use it with a wrapper such as casperjs, node-horseman or nightmarejs (there are lots more, search npm).

    Here's an example of google scraping from node-horseman web page:

    var Horseman = require('node-horseman');
    var horseman = new Horseman();
    
    var numLinks = horseman
      .open('http://www.google.com')
      .type('input[name="q"]', 'github')
      .click("button:contains('Google Search')")
      .waitForNextPage()
      .count("li.g");
    
    console.log("Number of links: " + numLinks);
    
    horseman.close();
    

    If you know how to inspect a page with the developer tools, you'll know how to write a scraper using phantomjs.


    One word of warning. Don't download google search too frequently otherwise google will probably detect your script as a bot and temporarily ban you. Make sure you wait an appropriate amount of time between searches.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 求差集那个函数有问题,有无佬可以解决
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题