dongqing6755 2012-12-08 22:14
浏览 78
已采纳

PHP中的JPEG IPTC数据无法正确显示UTF-8字符

When reading the IPTC data from an image, UTF-8 accented characters are not displaying properly when reading them via PHP.

For example: é, ø and ü

With a header content-type set as UTF8, instead of the character, I get the question mark in a black diamond. � If no content-type is set, then I get a dash character: —

The following is the code being used to read the IPTC block:

$file = '/path/to/image.jpg';
getimagesize($file, $info);
$iptc = iptcparse($info['APP13']);

I have also tried uploading the exact same image to a WordPress installation on the same server, and it properly strips the accented character and replaces it with it's basic latin equivalent. I don't mind if this is the end result, I would just like to read the characters properly.

Any ideas on how to get the complete and correct data from the image?

  • 写回答

2条回答 默认 最新

  • doubao6936 2013-02-15 07:56
    关注

    Answering a bit late, but since I had the same problem displaying special characters as č š ž (which appear in Slovenian alphabet) I may aswell answer for future reference.

    Solution to this problem actually is not related to php, but to the IPTC data encoding. By default most software that can write IPTC data will store it in plain ASCII. At first I've used Adobe Bridge - which actually displays all special characters as it should when you start tagging your images - but once you want to parse that data in PHP you will actually not see special characters. (I would have to check again this part, but the main catch is that two different encodings happen - one that encodes IPTC data on the image and one that displays that data in a program that can handle IPTC data - or something along this lines).

    To solve the problem I used a program called ExifTool which is an amazing piece of software and will let you manage almost any data on your image.

    Than I used it to convert all IPTC encodings to UTF-8 - and from then on I just had to retag images that had corrupt characters (which Adobe Bridge correctly displays but obviously does not save in correct encoding).

    The command to accomplish this on all images in a folder is:

    exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8
    

    You may also want to download ExifTool GUI if you are not familiar working from cmd.

    I haven't found any better program that could accomplish this same task faster.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部