dongsu3138 2018-01-25 16:09
浏览 75
已采纳

如何编码PHP中通过CURL获取的内容?

I have a PHP script that uses CURL to fetch the title and description of a user-entered URL and displays them on the page (which includes a utf-8 charset meta tag), and I'm having problems with characters not displaying correctly.

I read in this answer that the PHP CURL function encodes strings to utf-8 and that I need to decode strings with utf8_decode. But I'm finding that using utf8_decode is a hit or miss proposition -- sometimes it helps, sometimes, it creates unknown characters where there were none in the string before it was decoded.

I've included some examples below.

What's the proper way to handle encoding in this case?


Examples:

Here's the content fetched from a NY Times article with an emdash in the description. In this case, the decoded version displays the character properly:

enter image description here

Here's content from another NY Times article with an emdash in the description, and here, decoding made the character display improperly:

enter image description here

I'm finding that decoding causes problems with foreign language sites like this one in Spanish:

enter image description here

I know I can detect the language of the URL and decode or not based on that, but I'm finding plenty of English language sites where encoding causes problems, like this one:

enter image description here

  • 写回答

2条回答 默认 最新

  • dtml3340 2018-01-26 19:12
    关注

    After doing a lot more experimenting I stumbled on this solution, which fixed everything.

    My script fetched the URL contents and loaded them into a DOM document like this:

    $html = file_get_contents_curl($link_url);
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    

    Per the linked article, I changed it to this:

    $html = file_get_contents_curl($link_url);
    $doc = new DOMDocument();
    @$doc->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
    

    I also eliminated the use of utf8_decode.

    And everything displayed properly.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同
  • ¥50 如何openEuler 22.03上安装配置drbd
  • ¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
  • ¥15 无线连接树莓派,无法执行update,如何解决?(相关搜索:软件下载)
  • ¥15 Windows11, backspace, enter, space键失灵