dongshilve4392
dongshilve4392
2017-12-02 19:36

有没有办法用php file_get_contents绕过403错误?

已采纳

I'm trying to get a specific webpage using php file_get_contents - when I view the page directly there is no problem but when trying to grab it using php I get "failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden". Theres a piece of data that I'm trying to extract from the page.

$ft = file_get_contents('https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000');

echo $ft;

I've read up on various pages here about using stream_context_create, mainly the user agent part

$context  = stream_context_create(
array(
    "http" => array(
        "header" => "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"
    )
)

);

But nothing works and I now get a 400 error message. Unfortunately it doesn't look like my server is configured to use cURL so file_get_contents seems to be the only way for me to do this.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

2条回答

  • duanbo6482 duanbo6482 4年前

    You need to add the User-Agent header to the actual header:

    $context  = stream_context_create(
      array(
        'http' => array(
          'header' => 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
        ),
    ));
    

    You could also use the user_agent option:

    $context = stream_context_create(
      array(
        'http' => array(
          'user_agent' => 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
        ),
    ));
    

    Both above examples should work and you should now be able to get the contents using:

    $content = file_get_contents('https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000', false, $context);
    
    echo $content;
    

    This could of course also be tested using curl from the command line. Notice that we are setting our own User-Agent header:

    curl --verbose -H 'User-Agent: YourApplication/1.0' 'https://www.vesselfinder.com/vessels/CELEBRITY-MILLENNIUM-IMO-9189419-MMSI-249055000'
    

    It might also be worth knowing that the default User-Agent used by curl seems to be blocked, so if using curl you need to add your own using the -H flag.

    点赞 评论 复制链接分享
  • duan00529 duan00529 4年前

    Vesselfinder, the service you are making the request to, seems to deny automatic parsing of their data, as @ADyson said. Read the docs: https://www.vesselfinder.com/de/realtime-ais-data#rt-web-services You may ask them for an API token, maybe it is a paid plan.

    They have an official API. You need an Api key.

    点赞 评论 复制链接分享