dongtaigan1594 2013-08-12 09:07
浏览 38
已采纳

简单的HTML DOM For Parsing生成错误

I am using Simple HTML DOM Class for web-page scrapping. Issue is it generates weird characters against unicode character.

हंगामा है कà¥à¤¯à¥‚ठबरपा / अकबर इलाहाबादी 

against hindi unicode character.

लेकिन इतना तो हुआ कुछ लोग

Its my Hindi text.

When I print screen output it output in same weird characters.

function getDomContent($data) {
    $html = new simple_html_dom();
    $html->load($data);

    foreach ($html->find('table[id=content] li') as $element) {
        $content[] = $element->plaintext;
    }

    return $content;
}

My Curl function

function getContent($url) {
    $timeout = 5;
    $ch = curl_init();
    $user_agent = 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
    curl_setopt($ch, CURLOPT_TIMEOUT, 120);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
    curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
    curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

$data = getContent($url);
$content = getDomContent($data);
echo '<pre>Array Content: ' . '<br/>';
print_r($content);
die($query);
  • 写回答

2条回答 默认 最新

  • dongyou2279 2013-09-03 05:18
    关注

    I solved it by adding header to my page...

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    

    It solved all issues.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥30 python代码,帮调试,帮帮忙吧