dousi0144 2017-02-22 15:37
浏览 314
已采纳

php Curl 405不允许

Final Update It appears that the targeted website blocked DO IPs and are giving the problems which I've been resolving for days. I spinned a EC2 instance and manage to work the code working, together with caching etc so as to reduce the hit on the website and allow my user to share the website.

-

UPDATE: I manage to get the Html by setting curl error to off, however the website other than returning 405 error is also not setting some cookies which are required for the website content to be loaded.

curl_setopt($ch, CURLOPT_FAILONERROR, FALSE);

I'm using the following codes for ajax->PHP to retrieve og: meta for websites. However, there's 1 or 2 specific sites that returns error and would not retrieve the info. With the following errors. The code works seamlessly for majority of the websites.

Warning: DOMDocument::loadHTML(): Empty string supplied as input in /my/home/path/getUrlMeta.php on line 58

From curl_error in my error_log

The requested URL returned error: 405 Not Allowed

And

Failed to connect to www.something.com port 443: Connection refused

I have no problems getting the html of the website when I use curl on my server console and no problem retrieving information needed for majority of the websites using codes below

function file_get_contents_curl($url)
{
    $ch = curl_init();
    $header[0] = "Accept: text/html, text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: no-cache";
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);

    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    //curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');

    curl_setopt($ch, CURLOPT_FAILONERROR, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 " );
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    //The following 2 set up lines work with sites like www.nytimes.com

    //Update: Added option for cookie jar since some websites recommended it. cookies.txt is set to permission 777. Still doesn't work.
    $cookiefile = '/home/my/folder/cookies.txt';
    curl_setopt( $ch, CURLOPT_COOKIESESSION, true );
    curl_setopt( $ch, CURLOPT_COOKIEJAR,  $cookiefile );
    curl_setopt( $ch, CURLOPT_COOKIEFILE, $cookiefile );

    $data = curl_exec($ch);

  if(curl_error($ch))
    {
        error_log(curl_error($ch));
    }
    curl_close($ch);

    return $data;
}

$html = file_get_contents_curl($url);

libxml_use_internal_errors(true); // Yeah if you are so worried about using @ with warnings
$doc = new DomDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$query = '//*/meta[starts-with(@property, \'og:\')]';
$metas = $xpath->query($query);
$rmetas = array();
foreach ($metas as $meta) {
    $property = substr($meta->getAttribute('property'),3);
    $content = $meta->getAttribute('content');
    $rmetas[$property] = $content;
}

/*below code retrieves the next bigger than 600px image should og:image be empty.*/
if (empty($rmetas['image'])) {
    //$src = $xpath->evaluate("string(//img/@src)");
    //echo "src=" . $src . "
";
    $query = '//*/img';
    $srcs = $xpath->query($query);
    foreach ($srcs as $src) {

        $property = $src->getAttribute('src');


        if (substr($property,0,4) == 'http' && in_array(substr($property,-3), array('jpg','png','peg'), true)) {
            if (list($width, $height) = getimagesize($property)) {
            do if ($width > 600) {
                $rmetas['image'] = $property;
                break;
            } while (0);
            }
        }

    }
}

echo json_encode($rmetas);


die();

UPDATE: Error on my part that website is not https enabled so I still have the 405 not allowed error.

curl info

{
    "url": "http://www.example.com/",
    "content_type": null,
    "http_code": 405,
    "header_size": 0,
    "request_size": 458,
    "filetime": -1,
    "ssl_verify_result": 0,
    "redirect_count": 0,
    "total_time": 0.326782,
    "namelookup_time": 0.004364,
    "connect_time": 0.007725,
    "pretransfer_time": 0.007867,
    "size_upload": 0,
    "size_download": 0,
    "speed_download": 0,
    "speed_upload": 0,
    "download_content_length": -1,
    "upload_content_length": -1,
    "starttransfer_time": 0.326634,
    "redirect_time": 0,
    "redirect_url": "",
    "primary_ip": "SOME IP",
    "certinfo": [],
    "primary_port": 80,
    "local_ip": "SOME IP",
    "local_port": 52966
}

Update: If I do a curl -i from console I get the following response. A error 405 but it follows by all the HTML that I need.

Home> curl -i http://www.domain.com
HTTP/1.1 405 Not Allowed
Server: nginx
Date: Wed, 22 Feb 2017 17:57:03 GMT
Content-Type: text/html; charset=UTF-8
Transfer-Encoding: chunked
Vary: Accept-Encoding
Vary: Accept-Encoding
Set-Cookie: PHPSESSID2=ko67tfga36gpvrkk0rtqga4g94; path=/; domain=.domain.com
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: __PAGE_REFERRER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=www.domain.com
Set-Cookie: __PAGE_SITE_REFERRER=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=www.domain.com
X-Repository: legacy
X-App-Server: production-web23:8018
X-App-Server: distil2-kvm:80
  • 写回答

2条回答 默认 最新

  • doulv8162 2017-02-22 15:46
    关注

    Add the following to your code to help debug the issue:

    $info = curl_getinfo($ch);
    print_r( $info );
    

    More than likely, the issues are as follows:

    • 405 Not Allowed - the cURL call you are trying to make it not allowed. e.g. Making a GET call, when only POST is permitted.
    • 443: Connection refused - the site you are trying to access does not support HTTPS. Or, the site is using cryptographic protocols not supported by your code, e.g. using only TLSv1.2, while you code may be using TLSv1.1.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 unity从3D升级到urp管线,打包ab包后,材质全部变紫色
  • ¥50 comsol温度场仿真无法模拟微米级激光光斑
  • ¥15 上传图片时提交的存储类型
  • ¥15 Ubuntu开机显示器只显示kernel,是没操作系统(相关搜索:显卡驱动)
  • ¥15 VB.NET如何绘制倾斜的椭圆
  • ¥15 arbotix没有/cmd_vel话题
  • ¥20 找能定制Python脚本的
  • ¥15 odoo17的分包重新供应路线如何设置?可从销售订单中实时直接触发采购订单或相关单据
  • ¥15 用C语言怎么判断字符串的输入是否符合设定?
  • ¥15 通信专业本科生论文选这两个哪个方向好研究呀