dongrong6235 2014-08-05 20:14
浏览 49
已采纳

阅读Google Feed /警报时损坏的UTF-8编码

Whenever I try to read a Google alert via PHP using something like:

$feed = file_get_contents("http://www.google.com/alerts/feeds/01445174399729103044/950192755411504138");

Regardless of whether I save the $feed to a file or echo the result to the output, all utf-8 unicode characters ( i.e. those with diacritics) are represented by white space. I have tried - without success - various combinations of:

  • utf8_encode
  • utf8_decode
  • iconv
  • mb_convert_encoding

I think the wrong characters have come from the stream, but I'm lost because if I try this URI in a browser then everything is fine. Can anyone shed some light on the issue?

  • 写回答

1条回答 默认 最新

  • douyang5943 2014-08-06 10:48
    关注

    Sorry, you are absolutely correct - there is something untoward happening! Though it is not what you would first suspect... For reference, given that:

    echo mb_detect_encoding($feed); // prints: ASCII
    

    The unicode data is lost before it is even sent by the remote server - it appears that Google is looking at the user-agent string in the request header - which is non-existent using file_get_contents by default without a stream-context.

    Because it cannot identify the client making the request it defaults to and forces ASCII encoding. This is presumably a necessary fallback in the event of some kind of cataclysmic cock-up. [citation needed...]

    It's not simply enough to name your application however, you need to include a known vendor. I 'm unsure of the full extent of this but I believe most folks include "Mozilla [version]" to work around the issue, for example:

    $url = 'http://www.google.com/...';
    
    $feed = file_get_contents($url, false, stream_context_create([
        'http' => [
            'method' => 'GET',
            'header' => 'Accept-Charset: UTF-8' ."
    "
                       .'User-Agent: (Mozilla/5.0 compatible) MyFeedReader/1.0'
        ]
    ]));
    
    file_put_contents('test.txt', $feed); // should now work as expected
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么