duandun2218 2011-12-29 22:14
浏览 339

阿拉伯字符编码问题:UTF-8与Windows-1256

Quick Background: I inherited a large sql dump file containing a combination of english and arabic text and (I think) it was originally exported using 'latin1'. I changed all occurrences of 'latin1' to 'utf8' prior to importing the file. The the arabic text didn't appear correctly in phpmyadmin (which I guess is normal), but when I loaded the text to a web page with the following...

<meta http-equiv='Content-Type' content='text/html; charset=windows-1256'/> 

...everything looked good and the arabic text displayed perfectly.


Problem: My client is really really really picky and doesn't want to change his...

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

...to the 'Windows-1256' equivalent. I didn't think this would be a problem, but when I changed the charset value to 'UTF-8', all of the arabic characters appeared as diamonds with question marks. Shouldn't UTF-8 display arabic text correctly?


Here are a few notes about my database configuration:

  • Database charset is 'utf8'
  • Database connection collation is 'utf8_general_ci'
  • All databases, tables, and applicable fields have been collated as 'utf8_general_ci'

I've been scouring stack overflow and other forums for anything the relates to my issue. I've found similar problems, but not of the solutions seem to work for my specific situation. Hope someone can help!

  • 写回答

4条回答 默认 最新

  • duangou1953 2011-12-29 22:18
    关注

    We can't find the error in your code if you don't show us your code, so we're very limited in how we can help you.

    You told the browser to interpret the document as being UTF-8 rather than Windows-1256, but did you actually change the encoding used from Windows-1256 to UTF-8?

    For example,

    $ cat a.pl
    use strict;
    use warnings;
    use feature qw( say );
    use charnames ':full';
    
    my $enc = $ARGV[0] or die;
    binmode STDOUT, ":encoding($enc)";
    
    print <<"__EOI__";
    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=$enc">
    <title>Foo!</title>
    </head>
    <body dir="rtl">
    \N{ARABIC LETTER ALEF}\N{ARABIC LETTER LAM}\N{ARABIC LETTER AIN}\N{ARABIC LETTER REH}\N{ARABIC LETTER BEH}\N{ARABIC LETTER YEH}\N{ARABIC LETTER TEH MARBUTA}
    </body>
    </html>
    __EOI__
    
    $ perl a.pl UTF-8 > utf8.html
    
    $ perl a.pl Windows-1256 > cp1256.html
    
    评论

报告相同问题?

悬赏问题

  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码