PHP Curl得到403错误,但同一台机器的浏览器可以请求页面?

我的脚本通常没有问题。 我一般说,因为它从CNN.com,allrecipes.com,reddit.com等检索页面时 - 当我指向至少一个URL(foxnews.com)时,我得到403错误。</ p>

正如您所看到的,我已将用户代理设置为与我的机器浏览器相同(这是通过向Facebook主页发送请求所必需的,该主页返回了不支持浏览器的消息) 。</ p>

所以,基本上想知道我需要采取哪些步骤才能让尽可能多的网站将CURL请求识别为来自真实的实际浏览器,而不是403' 它。</ p>

  $ ch = curl_init(); 
$ timeout = 5;
curl_setopt($ ch,CURLOPT_URL,$ this-&gt; url);
curl_setopt ($ ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ ch,CURLOPT_HEADER,1);
curl_setopt($ ch,CURLOPT_CONNECTTIMEOUT,$ timeout);
curl_setopt($ ch,CURLOPT_USERAGENT,'Mozilla / 5.0(Macintosh; Intel Mac OS X 10_12_3)AppleWebKit / 602.4.8(KHTML,类似Gecko)版本/ 10.0.3 Safari / 602.4.8 ');
curl_setopt($ ch,CURLOPT_FRESH_CONNECT,1);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,1);
</ code> </ pre>
</ div>

展开原文

原文

I've got this script working with generally no problems. I say generally, because while it retrieves pages from CNN.com, allrecipes.com, reddit.com, etc - when I point it towards at least one URL (foxnews.com), I get a 403 error instead.

As you can see, I've set the user agent to the same as my machine's browser (that was necessitated by sending a request to Facebook's homepage, which returned a message that the browser wasn't supported).

So, basically wondering what step(s) I need to take to have as many sites as possible recognize the CURL request as coming from a real, actual browser, rather than 403'ing it.

    $ch = curl_init();
    $timeout = 5;
    curl_setopt($ch, CURLOPT_URL, $this->url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
    curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8');
    curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

1个回答



Fox News似乎阻止通过 USERAGENT </ code>的任何请求访问其网站。 只需删除 USERAGENT </ code>字符串就可以了。</ p>

  $ ch = curl_init(); 
$ timeout = 5;
curl_setopt($ ch ,CURLOPT_URL,$ this-&gt; url);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ ch,CURLOPT_HEADER,1);
curl_setopt($ ch,CURLOPT_CONNECTTIMEOUT,$ timeout);
curl_setopt($ ch ,CURLOPT_FRESH_CONNECT,1);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,1);
</ code> </ pre>

希望这会有所帮助! :)</ p>
</ div>

展开原文

原文

Fox News appears to be blocking access to their website from any request passing a USERAGENT. Simply removing the USERAGENT string works fine for me:

$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $this->url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

Hope this helps! :)

doucan4815
doucan4815 仍然有越来越多的问题... Fox工作,但NYTimes需要cookie,所以现在我正在设置cookie。 我想我会用一个用户代理(比如浏览器)回到狐狸并接受cookie,但这并没有解决它。 我真的很好奇如何使CURL看起来像真正的实时浏览器。 没有用户代理,元标记似乎也不那么一致,但我不能说实话,只是猜测。
3 年多之前 回复
dpx49470
dpx49470 谢谢! 我认为将USERAGENT设置为真正的浏览器字符串不会导致这样的问题,生活和学习我猜:)
3 年多之前 回复
立即提问