dongxuan1314 2014-01-29 18:15
浏览 825
已采纳

使用浏览器打开URL并且URL有效时,file_get_contents返回404

I get the following Error:

Warning: file_get_contents(https://www.readability.com/api/content/v1/parser?url=http://www.redmondpie.com/ps1-and-ps2-games-will-be-playable-on-playstation-4-very-soon/?utm_source=dlvr.it&utm_medium=twitter&token=MYAPIKEY) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 NOT FOUND in /home/DIR/htdocs/readability.php on line 23

With some Echoes I got the URL parsed by the function and it is fine and valid, I do the request from my Browser and it is OK.

The thing is that I get the Error Above with file_get_contents and I really don't understand why.

The URL is Valid and the Function is NOT Blocked by the Free Hosting Service (So I don't need Curl).

If someone could spot the error in my Code, I would appreciate it! Thanks...

Here is my Code:

<?php

class jsonRes{
    public $url;
    public $author;
    public $url;
    public $image;
    public $excerpt;
}

function getReadable($url){
 $api_key='MYAPIKEY';
 if(isset($url) && !empty($url)){

    // I tried changing to http, no 'www' etc... -THE URL IS VALID/The browser opens it normally-

    $requesturl='https://www.readability.com/api/content/v1/parser?url=' . urlencode($url) . '&token=' . $api_key;
    $response = file_get_contents($requesturl);   // * here the code FAILS! *

    $g = json_decode($response);

    $article_link=$g->url;
    $article_author='';
    if($g->author != null){
       $article_author=$g->author;
    }

    $article_url=$g->url;
    $article_image=''; 
    if($g->lead_image_url != null){
        $article_image=$g->lead_image_url;
    }
    $article_excerpt=$g->excerpt;

    $toJSON=new jsonRes();
    $toJSON->url=$article_link;
    $toJSON->author=$article_author;
    $toJSON->url=$article_url;
    $toJSON->image=$article_image;
    $toJSON->excerpt->$article_excerpt;

    $retJSONf=json_encode($toJSON);
    return $retJSONf;
 }
}
?>
  • 写回答

1条回答 默认 最新

  • drtpbx3606 2014-01-29 18:29
    关注

    Sometimes a website will block crawlers(from remote servers) from getting to their pages.

    What they do to work around this is spoof a browsers headers. Like pretend to be Mozilla Firefox instead of the sneaky PHP web scraper they are.

    This is a function which uses the cURL library to do just that.

    function get_data($url) {
    
    $userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13';
    
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    $html = curl_exec($ch);
    if (!$html) {
        echo "<br />cURL error number:" .curl_errno($ch);
        echo "<br />cURL error:" . curl_error($ch);
        exit;
    }
    else{
        return $html;
    }
    
    //End of cURL function
    
    }
    

    One would then call it as below:

    $response = get_data($requesturl);
    

    Curl offers much more options in fetching of remote content and error checking than file_get_contents does. If you even want to customize it further, check out the list of cURL options here - Abridged list of cURL options

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 逻辑谓词和消解原理的运用
  • ¥15 请求分析基于spring boot+vue的前后端分离的项目
  • ¥15 三菱伺服电机按启动按钮有使能但不动作
  • ¥15 js,页面2返回页面1时定位进入的设备
  • ¥200 关于#c++#的问题,请各位专家解答!网站的邀请码
  • ¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。
  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?