douping3891 2012-04-12 05:49
浏览 51

通过curl方法获取url数据,从而在符号中产生意外结果

I am facing some times Problem in getting url data by curl method specially website data is is in other language like arabic etc My curl function is

function file_get_contents_curl($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

    $data = curl_exec($ch);
    $info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

    //checking mime types
    if(strstr($info,'text/html')) {
        curl_close($ch);
        return $data;
    } else {
        return false;
    }
}

And how i am getting data

$html =  file_get_contents_curl($checkurl);
    $grid ='';
    if($html)
    {
        $doc = new DOMDocument();
        @$doc->loadHTML($html);
        $nodes = $doc->getElementsByTagName('title');
        @$title = $nodes->item(0)->nodeValue;
        @$metas = $doc->getElementsByTagName('meta');
        for ($i = 0; $i < $metas->length; $i++)
        {
            $meta = $metas->item($i);
            if($meta->getAttribute('name') == 'description')
                $description = $meta->getAttribute('content');
        }

I am getting all data correctly from some arabic websites like http://www.emaratalyoum.com/multimedia/videos/2012-04-08-1.474873 and when i give this youtube url http://www.youtube.com/watch?v=Eyxljw31TtU&feature=g-logo&context=G2c4f841FOAAAAAAAFAA
it shows symbols.. what setting i have to do to show exactly the same title description.

  • 写回答

3条回答 默认 最新

  • doumei2023 2012-11-06 11:18
    关注

    Introduction

    Getting Arabic can be very tricky but they are some basic steps you need to ensure

    • Your document must output UTF-8
    • Your DOMDocument must read in UTF-8 fromat

    Problem

    When getting Youtube information its already given the information in "UTF-8" format and the retrieval process adds addition UTF-8 encoding .... not sure why this occurs but a simple utf8_decode would fix the issue

    Example

    header('Content-Type: text/html; charset=UTF-8');
    echo displayMeta("http://www.emaratalyoum.com/multimedia/videos/2012-04-08-1.474873");
    echo displayMeta("http://www.youtube.com/watch?v=Eyxljw31TtU&feature=g-logo&context=G2c4f841FOAAAAAAAFAA"); 
    

    Output

    emaratalyoum.com

    التقطت عدسات الكاميرا حارس مرمى ريال مدريد إيكر كاسياس في موقف محرج قبل لحظات من بداية مباراة النادي الملكي مع أبويل القبرصي في ذهاب دور الثمانية لدوري أبطال 
    

    youtube.com

    أوروبا.ففي النفق المؤدي إلى الملعب، قام كاسياس بوضع إصبعه في أنفه، وبعدها قام بمسح يده في وجه أحدبنات سعوديات: أريد "شايب يدللني ولا شاب يعللني"
    

    Function Used

    displayMeta

    function displayMeta($checkurl) {
        $html = file_get_contents_curl($checkurl);
        $grid = '';
        if ($html) {
            $doc = new DOMDocument("1.0","UTF-8");
            @$doc->loadHTML($html);
            $nodes = $doc->getElementsByTagName('title');
            $title = $nodes->item(0)->nodeValue;
            $metas = $doc->getElementsByTagName('meta');
            for($i = 0; $i < $metas->length; $i ++) {
                $meta = $metas->item($i);
                if ($meta->getAttribute('name') == 'description') {
                    $description = $meta->getAttribute('content');
                    if (stripos(parse_url($checkurl, PHP_URL_HOST), "youtube") !== false)
                        return utf8_decode($description);
                    else {
                        return $description;
                    }
                }
            }
        }
    }
    

    *file_get_contents_curl*

    function file_get_contents_curl($url) {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    
        $data = curl_exec($ch);
        $info = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
    
        // checking mime types
        if (strstr($info, 'text/html')) {
            curl_close($ch);
            return $data;
        } else {
            return false;
        }
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥15 GDI处理通道视频时总是带有白色锯齿
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)
  • ¥15 自适应 AR 模型 参数估计Matlab程序
  • ¥100 角动量包络面如何用MATLAB绘制
  • ¥15 merge函数占用内存过大
  • ¥15 Revit2020下载问题
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大
  • ¥15 单片机无法进入HAL_TIM_PWM_PulseFinishedCallback回调函数