dounai6626
2017-03-20 23:05
浏览 282
已采纳

PHP Curl得到403错误,但同一台机器的浏览器可以请求页面?

I've got this script working with generally no problems. I say generally, because while it retrieves pages from CNN.com, allrecipes.com, reddit.com, etc - when I point it towards at least one URL (foxnews.com), I get a 403 error instead.

As you can see, I've set the user agent to the same as my machine's browser (that was necessitated by sending a request to Facebook's homepage, which returned a message that the browser wasn't supported).

So, basically wondering what step(s) I need to take to have as many sites as possible recognize the CURL request as coming from a real, actual browser, rather than 403'ing it.

    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $this->url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8');
    curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

图片转代码服务由CSDN问答提供 功能建议

我的脚本通常没有问题。 我一般说,因为它从CNN.com,allrecipes.com,reddit.com等检索页面时 - 当我指向至少一个URL(foxnews.com)时,我得到403错误。

正如您所看到的,我已将用户代理设置为与我的机器浏览器相同(这是通过向Facebook主页发送请求所必需的,该主页返回了不支持浏览器的消息) 。

所以,基本上想知道我需要采取哪些步骤才能让尽可能多的网站将CURL请求识别为来自真实的实际浏览器,而不是403' 它。

  $ ch = curl_init(); 
 $ timeout = 5; 
 curl_setopt($ ch,CURLOPT_URL,$ this-> url); 
 curl_setopt  ($ ch,CURLOPT_RETURNTRANSFER,1); 
 curl_setopt($ ch,CURLOPT_HEADER,1); 
 curl_setopt($ ch,CURLOPT_CONNECTTIMEOUT,$ timeout); 
 curl_setopt($ ch,CURLOPT_USERAGENT,'Mozilla / 5.0(Macintosh;  Intel Mac OS X 10_12_3)AppleWebKit / 602.4.8(KHTML,类似Gecko)版本/ 10.0.3 Safari / 602.4.8  '); 
 curl_setopt($ ch,CURLOPT_FRESH_CONNECT,1); 
 curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,1); 
   
 
  • 写回答
  • 好问题 提建议
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • doushi9856 2017-03-20 23:18
    已采纳

    Fox News appears to be blocking access to their website from any request passing a USERAGENT. Simply removing the USERAGENT string works fine for me:

    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $this->url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    

    Hope this helps! :)

    已采纳该答案
    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题