如何编码PHP中通过CURL获取的内容？

I have a PHP script that uses CURL to fetch the title and description of a user-entered URL and displays them on the page (which includes a utf-8 charset meta tag), and I'm having problems with characters not displaying correctly.

I read in this answer that the PHP CURL function encodes strings to utf-8 and that I need to decode strings with utf8_decode. But I'm finding that using utf8_decode is a hit or miss proposition -- sometimes it helps, sometimes, it creates unknown characters where there were none in the string before it was decoded.

I've included some examples below.

What's the proper way to handle encoding in this case?

Examples:

Here's the content fetched from a NY Times article with an emdash in the description. In this case, the decoded version displays the character properly:

Here's content from another NY Times article with an emdash in the description, and here, decoding made the character display improperly:

I'm finding that decoding causes problems with foreign language sites like this one in Spanish:

I know I can detect the language of the URL and decode or not based on that, but I'm finding plenty of English language sites where encoding causes problems, like this one:

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dtml3340 2018-01-26 19:12
关注
After doing a lot more experimenting I stumbled on this solution, which fixed everything.

My script fetched the URL contents and loaded them into a DOM document like this:

$html = file_get_contents_curl($link_url); $doc = new DOMDocument(); @$doc->loadHTML($html);

Per the linked article, I changed it to this:

$html = file_get_contents_curl($link_url); $doc = new DOMDocument(); @$doc->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));

I also eliminated the use of utf8_decode.

And everything displayed properly.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

如何编码PHP中通过CURL获取的内容？ php
2018-01-25 16:09

回答 2 已采纳 After doing a lot more experimenting I stumbled on this solution, which fixed everything. My scr
JSON编码到PHP变量通过Curl失败并返回空响应 json php
2018-08-02 19:21

回答 2 已采纳 The problem was solved my Username did not have the permission to access the data and we made mino
如何回显curl命令中使用的url /如何使返回的中文字符可读？ php
2019-06-14 13:55

回答 1 已采纳 it's a problem with whatever you're using to view the results. for example, if you're using a web
PHP中使用CURL获取页面title例子
2020-10-24 22:16

主要介绍了PHP中使用CURL获取页面title例子,本文使用正则实现获取页面title、页面编码、<head>标签中的内容,需要的朋友可以参考下
PHP curl 出现乱码 php
2021-07-11 16:26

回答 2 已采纳出现乱码是因为你把编码转换函数注释掉了把你下面的这行取消注释就好了。就是把#号去掉#$data=mb_convert_encoding(改成$data=mb_convert_encoding(
使用PHP curl发送UTF-16编码数据 php
2012-04-20 10:26

回答 1 已采纳 First, you need to find out what's your data encoding. Then, it's your choice. Both iconv() and mb
编码动态生成的URL的参数，以便在CURL中使用 php
2016-10-31 15:32

回答 1 已采纳 I find a solution: $string = 'http://www.example.com/result/سوتی/?foo=علی&bar=حسن'; $string = url
PHP基于curl post实现发送url及相关中文乱码问题解决方法
2020-10-18 23:11

在PHP开发过程中，有时我们需要通过cURL库来发送POST请求，特别是在处理远程API交互或数据交换时。然而，当涉及到包含中文字符的数据时，可能会遇到乱码问题。本篇文章将详细解析如何使用PHP的cURL库来正确地发送...
为什么urlencode数据要通过curl发布，为什么不对表单做同样的事情呢？ php
2015-07-26 08:51

回答 1 已采纳 You can do: curl_setopt($ch, CURLOPT_POSTFIELDS, $data); instead of the foreach loop. When you
以编码形式获取卷曲响应 php
2014-10-09 05:54

回答 2 已采纳 You will get same output as in your shell script with something like that : $headers = array(
如何使用邮政编码获得英国地区？ json php
2016-09-12 09:19

回答 1 已采纳 You're not initializing the curl object ($ch) anywhere, you need to do that before actually using
php中通过curl smtp发送邮件
2020-10-28 01:29

在PHP中，当传统的fsockopen函数被禁用或者不可用时，如文中所述的公司云平台关闭了该功能，可以使用cURL库通过SMTP协议发送邮件。cURL是一个强大的库，可以处理多种网络协议，包括SMTP。下面将详细介绍如何使用cURL...
来自其他硬编码应用程序的PHP干净URL（.htaccess）curl apache php
2013-12-23 17:40

回答 1 已采纳 You have incorrect regex in pattern of RewriteRule. Try this rule instead: RewriteEngine On Rewr
基于PHP CURL获取邮箱地址的详解
2020-12-18 13:56

在本文中，我们将深入探讨如何使用PHP的CURL库来获取邮箱地址。CURL（Client URL Library）是一个强大的工具，可以模拟HTTP请求，包括GET、POST等，常用于页面抓取、模拟登录以及数据采集。以下是一个使用PHP CURL...
php下通过curl抓取yahoo boss 搜索结果的实现代码
2020-12-19 12:29

在PHP中，cURL被用作一个扩展，允许开发者发送HTTP请求，获取远程网页的内容。这在Web抓取或API调用中非常有用。在提供的代码中，我们看到了一个名为`CurlUtil`的自定义类，它封装了cURL操作。让我们逐段分析这个...
关于PHP 如何用 curl 读取 HTTP chunked 数据
2020-10-22 19:17

在HTTP协议中，`Transfer-Encoding: chunked`是一种用于分块传输编码的方式，常用于服务器无法预先知道响应体总长度的情况。这种方式将响应体分成多个块（chunks），每一块都有一个大小标识，最后以一个零长度的块...
php获取页面编码,用php curl获取“br”编码和解码的页面内容
2021-04-27 01:04

RUI老师的博客 this page通过php curl:function curll($url,$headers=null){$ch = curl_init();curl_setopt($ch, CURLOPT_URL,$url);if ($headers){curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);}curl_setopt($ch, CURLOPT_...
没有解决我的问题, 去提问

悬赏问题

¥15 metadata提取的PDF元数据，如何转换为一个Excel
¥15 关于arduino编程toCharArray()函数的使用
¥100 vc++混合CEF采用CLR方式编译报错
¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误，如何解决？
¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
¥15 c#逐行读取txt文本，但是每一行里面数据之间空格数量不同
¥50 如何openEuler 22.03上安装配置drbd
¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
¥15 无线连接树莓派，无法执行update，如何解决？（相关搜索：软件下载）
¥15 Windows11, backspace, enter, space键失灵

如何编码PHP中通过CURL获取的内容？

2条回答 默认 最新

悬赏问题

2条回答默认最新