duanqian6982 2016-05-13 04:39
浏览 78
已采纳

PHP上传的文件名:日文字符编码

When uploading a file with a japanese name, some characters are creating problem. On a windows system, I want to save the name of the file as-uploaded. So I have to use mb_convert_encoding($name, "SJIS", "AUTO"); which works fine most of the cases.

Though, some characters like as in 0423図表① totally disappear at the end. It seems that when uploaded the name of the file is already "wrong": it looks like "0423å³è¡¨â .pptx" in UTF-8 and if I change the header charset with

header('Content-Type: text/html; charset=SJIS');

it looks like

 "0423テ・ツ崢ウティツ。ツィテ「ツ堕.pptx"

I am not sure what I can do in this case. I tried to replace the character but I cannot even find it with strpos() before or after the encoding conversion.

  • 写回答

1条回答 默认 最新

  • duan1930 2016-05-13 05:43
    关注

    To qualify my answer (to the downvoter):

    Q: I have heard that UTF-8 does not support some Japanese characters. Is this correct?

    A: There is a lot of misinformation floating around about the support of Chinese, Japanese and Korean (CJK) characters. The Unicode Standard supports all of the CJK characters from JIS X 0208, JIS X 0212, JIS X 0221, or JIS X 0213, for example, and many more. This is true no matter which encoding form of Unicode is used: UTF-8, UTF-16, or UTF-32.

    Unicode supports over 80,000 CJK characters right now, and work is underway to encode further additions. The International Standard ISO/IEC 10646 and the Unicode Standard are completely synchronized in repertoire and content. And that means that Unicode has the same repertoire as GB 18030, since that also is synchronized with ISO 10646 — although with a different ordering and byte format.

    From: The Unicode Consortium.

    My Answer:

    Rather than strpos use mb_stripos, from the PHP Multibyte string functions to find and replace characters. This should help your script detect and translate the non-latin characters.

    If the uploaded file name ($_FILES['var']['name']) is already incorrect in the PHP script (from output such as print_r($_FILES)) then you need to ensure you are correctly encoding the HTML form with accept-charset='UTF-8' (or SJIS, etc.). I would hope you're already well ahead of me on this.

    Also it may be advisable to add a few preconditionals at the top of your code, again using the PHP mb_ functions add at the top of your PHP page:

    mb_internal_encoding('UTF-8'); //or whatever character set works for you
    mb_http_output('SJIS');
    mb_http_input('UTF-8');
    mb_regex_encoding('UTF-8'); 
    

    Out of interest:

    http://www.unicode.org/reports/tr37/

    and

    http://david.latapie.name/blog/shift-jis-utf-8/

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 远程桌面文档内容复制粘贴,格式会变化
  • ¥15 关于#java#的问题:找一份能快速看完mooc视频的代码
  • ¥15 这种微信登录授权 谁可以做啊
  • ¥15 请问我该如何添加自己的数据去运行蚁群算法代码
  • ¥20 用HslCommunication 连接欧姆龙 plc有时会连接失败。报异常为“未知错误”
  • ¥15 网络设备配置与管理这个该怎么弄
  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题