douna4762 2014-10-17 12:10
浏览 89

如何使用php crawler从网站获取div标签中包含的所有数据

i'm having a code for simple php crawler that fetches all the html pages from websites upto depth 5 but if ,I run that for getting all the data contained in a div tag with its id like [container, main ,wrapper.etc] then it show unexpected result...heres the php code ::

<?php
    $a=$_POST['t1'];
function crawl_page($url, $depth = 5)
{
  static $seen = array();
  if (isset($seen[$url]) || $depth === 0) {
    return;
  }

  $seen[$url] = true;

  $dom = new DOMDocument('1.0');
  @$dom->loadHTMLFile($url);

  $anchors = $dom->getElementsByTagName('div');
  foreach ($anchors as $element) {
        $href = $element->getAttribute('id');
    //$href = $element->find('div[id=main]', 0)->plaintext;
    if (0 !== strpos($href, 'main')) {

        $host = "http://".parse_url($url,PHP_URL_USER);
        $href = $host. '/' . ltrim($href, '/');
    }
    crawl_page($href, $depth - 1);
  }

  echo "New Page:<br /> ";
  echo "URL:",$url,PHP_EOL,"<br />","CONTENT:",PHP_EOL,$dom->saveHTML(),PHP_EOL,PHP_EOL,"  <br />        <br />";
}

crawl_page($a, 5);
?>

this code is working good for anchor tags but i want this working for div tag only that fetches all the data contained in it nothing else. i want this for my project if anybody has done that then helpme out.......the html code is written down

<HTML>
<head>
<title></title>
</head>
<body>
<form method="POST" action="crawler1edit[2].php">
Enter Url:-<input type="text" name="t1">
<input type="submit" value="send" name="s1">
</form>
</body>
</HTML>

in action attribute crawler1edit[2].php is the php file containing php code written at the top

  • 写回答

1条回答 默认 最新

  • dtr87341 2014-10-17 12:36
    关注

    Is there a reason why you aren't just targeting the divs by ID ?

    $dom->getElementById ("main");
    
    评论

报告相同问题?

悬赏问题

  • ¥15 微信会员卡接入微信支付商户号收款
  • ¥15 如何获取烟草零售终端数据
  • ¥15 数学建模招标中位数问题
  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?