doupu1957 2010-10-21 10:29
浏览 41
已采纳

iso-8895-1至xml可接受的UTF-8

I am parsing text/html from web pages into an xml feed, the text/html is encoded iso-8895-1 while the XML feed must be UTF-8. I have used html entities, but am having to manually replace loads of characters, here is what I have so far (still not parsing all text)

$desc = str_replace(array("
", "", "
"),"",$desc);
    $desc = str_replace(array("’","‘","”","“"),"'",$desc);
  $desc = str_replace("£","£",$desc);
    $desc = str_replace("é","é",$desc);
    $desc = str_replace("²","2",$desc);
    $desc = str_replace(array("-","•"),"‐",$desc);
$desc = htmlentities($desc, ENT_QUOTES, "UTF-8");
  • 写回答

2条回答 默认 最新

  • dpka7974 2010-10-21 10:31
    关注

    Use iconv(). It will allow you to use native characters in UTF-8 as well - no need for HTML entities.

    $data = iconv("ISO-8859-1", "UTF-8", $text);
    

    when doing encoding from UTF-8 to another character set, use IGNORE or TRANSLIT to drop or transliterate non-translatable characters.

    alternatively, the mb_* functions shown by @Gumbo will work as well.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部