dongzhankou2090
dongzhankou2090
2015-07-21 05:24

如何获取特定网址的完整html内容?

已采纳

I used several method to get html content of aptoide.com in php.

1) file_get_contents();

2) readfile();

3) curl as php function

function get_dataa($url) {
   $ch = curl_init($url);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
   curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
   curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)");
   $data = curl_exec($ch);
   curl_close($ch);
   return $data;
}

4)PHP Simple HTML DOM Parser

include_once('simple_html_dom.php');
$url="http://aptoide.com";
$html = file_get_html($url);

But all of them give empty output for aptoide.com Is there a way to get full html content of that url ?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

2条回答

  • douaonong7807 douaonong7807 6年前

    echo file_get_contents('http://www.aptoide.com/'); works fine for me.

    So it's possible that aptoide.com has been blocked you. If you want to change your IP (as you said in comment) you have to use this:

    $url = 'http://aptoide.com.com/';
    $proxy = '127.0.0.1:9095'; // Your proxy
    // $proxyauth = 'user:password'; // Proxy authentication if required
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$url);
    curl_setopt($ch, CURLOPT_PROXY, $proxy);
    //curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    $curl_scraped_page = curl_exec($ch);
    curl_close($ch);
    
    echo $curl_scraped_page;
    
    点赞 评论 复制链接分享
  • dpj775835868 dpj775835868 6年前

    use your curl get_dataa function with this line added:

    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    

    because that page is redirecting to www.aptide.com full function:

    function get_dataa($url) {
        $ch = curl_init($url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)");
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
    }
    
    点赞 评论 复制链接分享

相关推荐