duan198123 2013-11-15 22:10
浏览 26
已采纳

如何用PHP刮掉SERP(适用于小项目)

I thought this would be fairly simple but it's proving challenging. Google uses https:// now and bing redirects to remove HTTP://.

How can I grab the top 5 URLs for a given search term?

I've tried several methods (including loading results into an iframe), but keep hitting brick walls with everything I try.

I wouldn't even need a proxy, as I'm talking about a very small amount results to be harvested, and will only use it for 20-30 terms once ever few months. Hardly enough to trigger whiplash from the search giants.

Any help would be much appreciated!

Here's one example of what I've tried:

$query = urlencode("test"); 

preg_match_all('/<a title=".*?" href=(.*?)>/', file_get_contents("http://www.bing.com/search?q=" . urlencode($query) ), $matches); 

echo implode("<br>", $matches[1]);
  • 写回答

2条回答 默认 最新

  • duan19780629 2013-11-16 00:52
    关注

    There's three main ways to do this. Firstly, use the official API for the search engine you're using - Google has one, and most of them will. These are often volume limited, but for the numbers you're talking about, you'll be fine.

    The second way is to use a scraper program to visit the search page, enter a search term, and submit the associated form. Since you've specified PHP, I'd recommend Goutte. Internally it uses Guzzle and Symfony Components, so it must be good! The README at the above link shows you how easy it is. Selection of HTML fragments is done using either XPath or CSS, so it is flexible too.

    Lastly, given the low volume of required scrapes, consider downloading a free software package from Import.io. This lets you build a scraper using a point-and-click interface, and it learns how to scrape various areas of the page before storing the data in a local or cloud database.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 公交车和无人机协同运输
  • ¥15 stm32代码移植没反应
  • ¥15 matlab基于pde算法图像修复,为什么只能对示例图像有效
  • ¥100 连续两帧图像高速减法
  • ¥15 组策略中的计算机配置策略无法下发
  • ¥15 如何绘制动力学系统的相图
  • ¥15 对接wps接口实现获取元数据
  • ¥20 给自己本科IT专业毕业的妹m找个实习工作
  • ¥15 用友U8:向一个无法连接的网络尝试了一个套接字操作,如何解决?
  • ¥30 我的代码按理说完成了模型的搭建、训练、验证测试等工作(标签-网络|关键词-变化检测)