简单的HTML DOM For Parsing生成错误

I am using Simple HTML DOM Class for web-page scrapping. Issue is it generates weird characters against unicode character.

à¤¹à¤‚à¤—à¤¾à¤®à¤¾ à¤¹à¥ˆ à¤•à¥à¤¯à¥‚à¤ à¤¬à¤°à¤ªà¤¾ / à¤…à¤•à¤¬à¤° à¤‡à¤²à¤¾à¤¹à¤¾à¤¬à¤¾à¤¦à¥€

against hindi unicode character.

लेकिन इतना तो हुआ कुछ लोग

Its my Hindi text.

When I print screen output it output in same weird characters.

function getDomContent($data) {
    $html = new simple_html_dom();
    $html->load($data);

    foreach ($html->find('table[id=content] li') as $element) {
        $content[] = $element->plaintext;
    }

    return $content;
}

My Curl function

function getContent($url) {
    $timeout = 5;
    $ch = curl_init();
    $user_agent = 'Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
    curl_setopt($ch, CURLOPT_TIMEOUT, 120);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
    curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
    curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
    $data = curl_exec($ch);
    curl_close($ch);
    return $data;
}

$data = getContent($url);
$content = getDomContent($data);
echo '<pre>Array Content: ' . '<br/>';
print_r($content);
die($query);

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongyou2279 2013-09-03 05:18
关注
I solved it by adding header to my page...

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

It solved all issues.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

简单的HTML DOM For Parsing生成错误 php
2013-08-12 09:07

回答 2 已采纳 I solved it by adding header to my page... <meta http-equiv="Content-Type" content="text/html;
具有多个表的PHP简单HTML DOM解析器 html json php
2018-01-21 23:51

回答 1 已采纳 Found the answer to my question with help from user sms who commented above. This php pulls the da
PHP简单的HTML DOM对象解码 ajax php
2014-02-04 14:06

回答 3 已采纳 The response of the ebay is JSON, but there is a <!-- RlogId t6%60jjpfg%3C%3D%60mb6a54d.47e3-14
您如何在PHP中解析和处理HTML / XML？
2019-12-04 10:40

asdfgh0077的博客如何解析HTML / XML并从中提取信息？
简单的HTML DOM Parser刮div html php
2014-11-11 01:18

回答 1 已采纳 You could try to check the sequences by using a loop (foreach). Check if the div has an image clas
简单的html dom总是加载默认的第一页而不是指定的url html php
2018-05-22 21:19

回答 1 已采纳 You need to html_entity_decode those links, I can see that they are getting mangled by simple-html
使用DOMDocument解析HTML时的Rogue元素 html php
2018-01-29 07:23

回答 1 已采纳 It comes from the line : <script type"text/javascript" src="/includes/js/video-js/video.js"&gt
PHP未定义名称的方法,php – 未定义名称空间前缀.如何自动定义/忽略错误？
2021-03-26 12:56

7323的博客我创建了一个解析XML文件的PHP脚本,当我尝试解析它时,出现错误：2: DOMDocument::load(): Namespace prefix edf for represent on info isnot defined in/users/zzz/testing/meta.xml,line: 2我一直在寻找修复,但我...
PHP - 通过DOM解析html表 php
2013-05-05 10:50

回答 2 已采纳 There you go (you have to play with the attributes a bit to get your desire output): In this solut
无法解析成<code>标签 - PHP - 简单的html dom html php
2014-03-12 23:49

回答 2 已采纳 simplehtmldom among others strips out pre formatted tags. If you want code tag to be recognized de
简单的HTML DOM从标题内获取href和锚文本 php
2015-12-03 02:45

回答 1 已采纳 Instead of h3 with class title you have to select the anchor. so h3.title a now from that anchor y
linux php安装xsl扩展,11.32 php扩展模块安装
2021-04-03 20:46

weixin_39992788的博客 -11.32php扩展模块安装-扩展-apacherewrite教程http://coffeelet.blog.163.com/blog/static/13515745320115842755199/http://www.cnblogs.com/top5/archive/2009/08/12/1544098.html-apacherewrite出现死循环...
PHP DOM getElementsbytagname（）getElementById（） php
2017-04-25 14:48

回答 3 已采纳 Hope this will help you out, DOMDocument::getElementById will return DOMElement Object. $domObj
从HTML文件中提取正文的简单方案
2019-10-06 01:01

a13393665983的博客从HTML文件中提取正文的简单方案 http://www.basesnet.com/seo/53从HTML文件中提取正文的简单方案2012-03-07/SEO/HTML文件,提取正文,简单方案/1多种基于html正文提取的思想一、基于统计的中文网页正...
css sprites_CSS Sprites生成工具
2020-09-26 09:21

cunbei2644的博客 Here's my last weekend's project - a web-based tool to generate images for CSS sprites: http://www.csssprites.com. Cool domain name, eh? I couldn't believe it was not taken. 这是我上一个周末的项目-一...
jsoup解析html_3使用Jsoup解析Java中HTML文件的示例
2020-12-16 22:06

cunhu4317的博客 jsoup解析html HTML是Web的核心，无论您是通过JavaScript，JSP，PHP，ASP或任何其他Web技术动态生成的，您在Internet上看到的所有页面都是基于HTML的。您的浏览器实际上是解析HTML并为您呈现。但是，如果需要解析...
网络安全web方向入门题合集
2021-08-15 11:33

dra_p0p3n的博客这是我大三在实习的时候做的一些题目，以后准备搞人工智能方向，应该也不会再接触网安了，就在这里汇总一下之前做的简单题吧，虽然简单但是也用了很多功夫去写这些，提高也很大，以此留作纪念吧。
word中将空格替换为_以编程方式在网页中将Microsoft Word文档显示为图像
2020-07-18 22:42

cunchi8090的博客 Scrape docs.google.com for the docid parameter) Download the PHP Simple HTML DOM Parser here: http://sourceforge.net/projects/simplehtmldom/files/simplehtmldom/1.5/simplehtmldom_1_5.zip/download 在...
PHP 函数大全
2018-03-02 18:48

且听海啸的博客 bbcode_set_arg_parser Attach another parser in order to use another rule set for argument parsing bbcode_set_flags Set or alter parser options bcadd 2个任意精度数字的加法计算 bccomp 比较两个任意...
PHP7新特性 What will be in PHP 7/PHPNG
2017-08-11 08:55

tboqi1的博客本文结合php官网和鸟哥相关文章总结：官网：http://www.php7.ca/ https://wiki.php.net/phpng PHP7将在2015年12月正式发布，PHP7 ，将会是PHP脚本语言的重大版本更新，同时将带来大幅的性能改进和新的特性...
没有解决我的问题, 去提问

悬赏问题

¥15 乘性高斯噪声在深度学习网络中的应用
¥15 运筹学排序问题中的在线排序
¥15 关于docker部署flink集成hadoop的yarn，请教个问题 flink启动yarn-session.sh连不上hadoop，这个整了好几天一直不行，求帮忙看一下怎么解决
¥30 求一段fortran代码用IVF编译运行的结果
¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败
¥20 有人能用聚类分析帮我分析一下文本内容嘛
¥30 python代码，帮调试，帮帮忙吧

简单的HTML DOM For Parsing生成错误

2条回答 默认 最新

悬赏问题

2条回答默认最新