dongyan5641 2012-12-24 19:47
浏览 72

为什么这个PHP爬虫没有工作?

In my localhost document root:

crawl.html

<html>
<body>
<p>
<form action="welcome.php" method="get">
Site to crawl: <input type="text" name="crawlThis">
<input type="submit">
</form>
</p>

</body>
</html> 

welcome.php

 <html>
 <body>

 <?php 
 include ("crawler.php");

 echo $crawl = new Crawler($_GET["crawlThis"]);

 $images = $crawl->get("images");

 $links = $crawl->get("links"); 

 echo $links;
 echo $images;

 ?>
 <br>

</body>
</html> 

and crawler.php

<?php

class Crawler {

protected $markup = '';

public function __construct($uri) {

$this->markup = $this->getMarkup($uri);

}

public function getMarkup($uri) {

return file_get_contents($uri);

}

public function get($type) {

$method = "_get_{$type}";

if (method_exists($this, $method)){

return call_user_method($method, $this);

}

}

protected function _get_images() {

if (!empty($this->markup)){

preg_match_all('/<img([^>]+)\/>/i', $this->markup, $images);

return !empty($images[1]) ? $images[1] : FALSE;

}

}

protected function _get_links() {

if (!empty($this->markup)){

preg_match_all('/<a([^>]+)\>(.*?)\<\/a\>/i', $this->markup, $links);

return !empty($links[1]) ? $links[1] : FALSE;

}

}

}


/*$crawl = new Crawler($);

$images = $crawl->get('images');

$links = $crawl->get('links');*/

?>

Result page is just empty. Can't figure out if I just can't echo $images, or if my logic is wrong. I'm expecting a list of images, and then a list of links.

Also, do I have to include crawler.php or will php search its container directory for a class of the same name?

Sorry, coming to PHP from Java is a bit of a mindscrew.

  • 写回答

2条回答 默认 最新

  • duan0414 2012-12-24 19:55
    关注

    I'm all for writing it yourself, but there are plenty of documented examples that will do this. Here's a good example that you can follow or use:

    crawler example

    评论

报告相同问题?

悬赏问题

  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探