dousi0144 2017-02-22 15:37
浏览 313
已采纳

php Curl 405不允许

Final Update It appears that the targeted website blocked DO IPs and are giving the problems which I've been resolving for days. I spinned a EC2 instance and manage to work the code working, together with caching etc so as to reduce the hit on the website and allow my user to share the website.

-

UPDATE: I manage to get the Html by setting curl error to off, however the website other than returning 405 error is also not setting some cookies which are required for the website content to be loaded.

curl_setopt($ch, CURLOPT_FAILONERROR, FALSE);

I'm using the following codes for ajax->PHP to retrieve og: meta for websites. However, there's 1 or 2 specific sites that returns error and would not retrieve the info. With the following errors. The code works seamlessly for majority of the websites.

Warning: DOMDocument::loadHTML(): Empty string supplied as input in /my/home/path/getUrlMeta.php on line 58

From curl_error in my error_log

The requested URL returned error: 405 Not Allowed

And

Failed to connect to www.something.com port 443: Connection refused

I have no problems getting the html of the website when I use curl on my server console and no problem retrieving information needed for majority of the websites using codes below

function file_get_contents_curl($url)
{
    $ch = curl_init();
    $header[0] = "Accept: text/html, text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: no-cache";
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    //curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 " );
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    //The following 2 set up lines work with sites like www.nytimes.com

    //Update: Added option for cookie jar since some websites recommended it. cookies.txt is set to permission 777. Still doesn't work.
    $cookiefile = '/home/my/folder/cookies.txt';
    curl_setopt( $ch, CURLOPT_COOKIESESSION, true );
    curl_setopt( $ch, CURLOPT_COOKIEJAR,  $cookiefile );
    curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookiefile );

    $data = curl_exec($ch);

  if(curl_error($ch))
    {
        error_log(curl_error($ch));
    }
    curl_close($ch);

    return $data;
}

$html = file_get_contents_curl($url);

libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = substr($meta->getAttribute('property'),3);
    $content = $meta->getAttribute('content');
    $rmetas[$property] = $content;
}

/*below code retrieves the next bigger than 600px image should og:image be empty.*/
if (empty($rmetas['image'])) {
    //$src = $xpath->evaluate("string(//img/@src)");
    //echo "src=" . $src . "
";
    $query = '//*/img';
    $srcs = $xpath->query($query);
    foreach ($srcs as $src) {

        $property = $src->getAttribute('src');


        if (substr($property,0,4) == 'http' && in_array(substr($property,-3), array('jpg','png','peg'), true)) {
            if (list($width, $height) = getimagesize($property)) {
            do if ($width > 600) {
                $rmetas['image'] = $property;
                break;
            } while (0);
            }
        }

    }
}

echo json_encode($rmetas);


die();

UPDATE: Error on my part that website is not https enabled so I still have the 405 not allowed error.

curl info

{
    "url": "http://www.example.com/",
    "content_type": null,
    "http_code": 405,
    "header_size": 0,
    "request_size": 458,
    "filetime": -1,
    "ssl_verify_result": 0,
    "redirect_count": 0,
    "total_time": 0.326782,
    "namelookup_time": 0.004364,
    "connect_time": 0.007725,
    "pretransfer_time": 0.007867,
    "size_upload": 0,
    "size_download": 0,
    "speed_download": 0,
    "speed_upload": 0,
    "download_content_length": -1,
    "upload_content_length": -1,
    "starttransfer_time": 0.326634,
    "redirect_time": 0,
    "redirect_url": "",
    "primary_ip": "SOME IP",
    "certinfo": [],
    "primary_port": 80,
    "local_ip": "SOME IP",
    "local_port": 52966
}

Update: If I do a curl -i from console I get the following response. A error 405 but it follows by all the HTML that I need.

Home> curl -i http://www.domain.com
HTTP/1.1 405 Not Allowed
Server: nginx
Date: Wed, 22 Feb 2017 17:57:03 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Vary: Accept-Encoding
Vary: Accept-Encoding
Set-Cookie: PHPSESSID2=ko67tfga36gpvrkk0rtqga4g94; path=/; domain=.domain.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: __PAGE_REFERRER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=www.domain.com
Set-Cookie: __PAGE_SITE_REFERRER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=www.domain.com
X-Repository: legacy
X-App-Server: production-web23:8018
X-App-Server: distil2-kvm:80
  • 写回答

2条回答 默认 最新

  • doulv8162 2017-02-22 15:46
    关注

    Add the following to your code to help debug the issue:

    $info = curl_getinfo($ch);
    print_r( $info );
    

    More than likely, the issues are as follows:

    • 405 Not Allowed - the cURL call you are trying to make it not allowed. e.g. Making a GET call, when only POST is permitted.
    • 443: Connection refused - the site you are trying to access does not support HTTPS. Or, the site is using cryptographic protocols not supported by your code, e.g. using only TLSv1.2, while you code may be using TLSv1.1.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 c程序不知道为什么得不到结果
  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置