doupu1957 2010-10-21 18:29
浏览 41
已采纳

iso-8895-1至xml可接受的UTF-8

I am parsing text/html from web pages into an xml feed, the text/html is encoded iso-8895-1 while the XML feed must be UTF-8. I have used html entities, but am having to manually replace loads of characters, here is what I have so far (still not parsing all text)

$desc = str_replace(array("
", "", "
"),"",$desc);
    $desc = str_replace(array("’","‘","”","“"),"'",$desc);
  $desc = str_replace("£","£",$desc);
    $desc = str_replace("é","é",$desc);
    $desc = str_replace("²","2",$desc);
    $desc = str_replace(array("-","•"),"‐",$desc);
$desc = htmlentities($desc, ENT_QUOTES, "UTF-8");
  • 写回答

2条回答 默认 最新

  • dpka7974 2010-10-21 18:31
    关注

    Use iconv(). It will allow you to use native characters in UTF-8 as well - no need for HTML entities.

    $data = iconv("ISO-8859-1", "UTF-8", $text);
    

    when doing encoding from UTF-8 to another character set, use IGNORE or TRANSLIT to drop or transliterate non-translatable characters.

    alternatively, the mb_* functions shown by @Gumbo will work as well.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 FPGA-SRIO初始化失败
  • ¥15 MapReduce实现倒排索引失败
  • ¥15 ZABBIX6.0L连接数据库报错,如何解决?(操作系统-centos)
  • ¥15 找一位技术过硬的游戏pj程序员
  • ¥15 matlab生成电测深三层曲线模型代码
  • ¥50 随机森林与房贷信用风险模型
  • ¥50 buildozer打包kivy app失败
  • ¥30 在vs2022里运行python代码
  • ¥15 不同尺寸货物如何寻找合适的包装箱型谱
  • ¥15 求解 yolo算法问题