dousi0144 2017-02-22 15:37
浏览 314
已采纳

php Curl 405不允许

Final Update It appears that the targeted website blocked DO IPs and are giving the problems which I've been resolving for days. I spinned a EC2 instance and manage to work the code working, together with caching etc so as to reduce the hit on the website and allow my user to share the website.

-

UPDATE: I manage to get the Html by setting curl error to off, however the website other than returning 405 error is also not setting some cookies which are required for the website content to be loaded.

curl_setopt($ch, CURLOPT_FAILONERROR, FALSE);

I'm using the following codes for ajax->PHP to retrieve og: meta for websites. However, there's 1 or 2 specific sites that returns error and would not retrieve the info. With the following errors. The code works seamlessly for majority of the websites.

Warning: DOMDocument::loadHTML(): Empty string supplied as input in /my/home/path/getUrlMeta.php on line 58

From curl_error in my error_log

The requested URL returned error: 405 Not Allowed

And

Failed to connect to www.something.com port 443: Connection refused

I have no problems getting the html of the website when I use curl on my server console and no problem retrieving information needed for majority of the websites using codes below

function file_get_contents_curl($url)
{
    $ch = curl_init();
    $header[0] = "Accept: text/html, text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: no-cache";
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    //curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 " );
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    //The following 2 set up lines work with sites like www.nytimes.com

    //Update: Added option for cookie jar since some websites recommended it. cookies.txt is set to permission 777. Still doesn't work.
    $cookiefile = '/home/my/folder/cookies.txt';
    curl_setopt( $ch, CURLOPT_COOKIESESSION, true );
    curl_setopt( $ch, CURLOPT_COOKIEJAR,  $cookiefile );
    curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookiefile );

    $data = curl_exec($ch);

  if(curl_error($ch))
    {
        error_log(curl_error($ch));
    }
    curl_close($ch);

    return $data;
}

$html = file_get_contents_curl($url);

libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = substr($meta->getAttribute('property'),3);
    $content = $meta->getAttribute('content');
    $rmetas[$property] = $content;
}

/*below code retrieves the next bigger than 600px image should og:image be empty.*/
if (empty($rmetas['image'])) {
    //$src = $xpath->evaluate("string(//img/@src)");
    //echo "src=" . $src . "
";
    $query = '//*/img';
    $srcs = $xpath->query($query);
    foreach ($srcs as $src) {

        $property = $src->getAttribute('src');


        if (substr($property,0,4) == 'http' && in_array(substr($property,-3), array('jpg','png','peg'), true)) {
            if (list($width, $height) = getimagesize($property)) {
            do if ($width > 600) {
                $rmetas['image'] = $property;
                break;
            } while (0);
            }
        }

    }
}

echo json_encode($rmetas);


die();

UPDATE: Error on my part that website is not https enabled so I still have the 405 not allowed error.

curl info

{
    "url": "http://www.example.com/",
    "content_type": null,
    "http_code": 405,
    "header_size": 0,
    "request_size": 458,
    "filetime": -1,
    "ssl_verify_result": 0,
    "redirect_count": 0,
    "total_time": 0.326782,
    "namelookup_time": 0.004364,
    "connect_time": 0.007725,
    "pretransfer_time": 0.007867,
    "size_upload": 0,
    "size_download": 0,
    "speed_download": 0,
    "speed_upload": 0,
    "download_content_length": -1,
    "upload_content_length": -1,
    "starttransfer_time": 0.326634,
    "redirect_time": 0,
    "redirect_url": "",
    "primary_ip": "SOME IP",
    "certinfo": [],
    "primary_port": 80,
    "local_ip": "SOME IP",
    "local_port": 52966
}

Update: If I do a curl -i from console I get the following response. A error 405 but it follows by all the HTML that I need.

Home> curl -i http://www.domain.com
HTTP/1.1 405 Not Allowed
Server: nginx
Date: Wed, 22 Feb 2017 17:57:03 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Vary: Accept-Encoding
Vary: Accept-Encoding
Set-Cookie: PHPSESSID2=ko67tfga36gpvrkk0rtqga4g94; path=/; domain=.domain.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: __PAGE_REFERRER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=www.domain.com
Set-Cookie: __PAGE_SITE_REFERRER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=www.domain.com
X-Repository: legacy
X-App-Server: production-web23:8018
X-App-Server: distil2-kvm:80
  • 写回答

2条回答 默认 最新

  • doulv8162 2017-02-22 15:46
    关注

    Add the following to your code to help debug the issue:

    $info = curl_getinfo($ch);
    print_r( $info );
    

    More than likely, the issues are as follows:

    • 405 Not Allowed - the cURL call you are trying to make it not allowed. e.g. Making a GET call, when only POST is permitted.
    • 443: Connection refused - the site you are trying to access does not support HTTPS. Or, the site is using cryptographic protocols not supported by your code, e.g. using only TLSv1.2, while you code may be using TLSv1.1.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 java 的protected权限 ,问题在注释里
  • ¥15 这个是哪里有问题啊?