dongyanggan3025 2010-04-20 13:27
浏览 162
已采纳

将MySQL文本字段编码为UTF-8文本文件 - 特殊字符的问题

I'm writing a php script to export MySQL database rows into a .txt file formatted for Adobe InDesign's internal markup.

Exports work, but when I encounter special characters like é or umlauts, I get weird symbols (eg Chloë Hanslip instead of Chloë Hanslip). Rather than run a search and replace for every possible weird character, I need a better method.

I've checked that when the text hits the database, it's saved properly - in the database I see the special characters. My export code basically runs some regular expressions to put in the InDesign code tags, and I'm left with the weird symbols. If I just output the text to the browser (rather than prompt for a text file download), it displays properly. When I save the file I use this code:

header("Content-disposition: attachment; filename=test.txt");

header("Content-Type: text/plain; charset=utf-8");

I've tried various combinations of utf8_encode() and iconv() to no avail. Can anybody point me in the right direction here?

  • 写回答

5条回答 默认 最新

  • doulan1866 2010-04-20 13:45
    关注

    InDesign wouldn't be able to use any encoding specified in the header. (It wouldn't even see it, as it's not kept when you save to disc in Windows.) Instead you have to explicitly tell it the encoding in a special tag of its own at the start of the file, such as:

    <ANSI-WIN>
    

    Unfortunately, it does not use standard encoding names and there is no tag that InDesign understands that corresponds to UTF-8 encoding at all. The only encoding tag you can use that will allow you to include any character you like is:

    <UNICODE-WIN>
    

    which corresponds to UTF-16 (little-endian with BOM), with Windows CRLF line endings. (The only other line ending option is MAC, which you don't want at all as it's old-school pre-OSX Macs where the line ending character was CR.)

    So, given a UTF-8 string $s including UTF-8 byte sequences you've pulled out of the database and plain (Unix-Linux-OSX-web-style) LF newlines, you'd write it like this:

    $s= "<UNICODE-WIN>
    ".str_replace("
    ", "
    ", $s);
    echo iconv('UTF-8', 'UTF-16', $s);
    

    (Ensuring not to output any whitespace before or after, because it'll break the UTF-16 encoding.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥60 pb数据库修改或者求完整pb库存系统,需为pb自带数据库
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab
  • ¥15 求lingo代码和思路