dselp3944 2018-02-02 10:21
浏览 79
已采纳

如何从PHP中的UTF8字符“删除变音符号”?

I need to replicate the behavior of MySQL utf8_general_ci collation in PHP. Strictly speaking I need to detect what whould be considered different and what would be considered the same. The case independent part is easy. The problem is utf_general_ci considers characters with diacritics and characters without diacritics to be equal: e = è = é etc.. To replicate that comparison, I'd need to have a way to replace è -> e, é -> e.

The method that comes to my mind is:

echo iconv("utf-8", "ascii//TRANSLIT", "é");

One problem is iconv behaves differently depending on current locale and that's asking for a problem.

The other problem is the input may also contain Cirillic letters that shouldn't be stripped or result in a PHP Notice.

echo iconv("utf-8", "ascii//TRANSLIT", "дом");

Is there a solution or do I have to create manually mapping of each character with diacritic to a one without it?

  • 写回答

2条回答 默认 最新

  • doulu4413 2018-02-02 11:07
    关注

    intl's Transliterator will let you define far more in-depth transliteration rules. The full documentation on transliteration rules can be found on icu-project.org.

    1. $tests = [ "é", "дом" ];
    2. $tl = Transliterator::create('Latin-ASCII;');
    3. foreach($tests as $str) {
    4. var_dump(
    5. $tl->transliterate($str)
    6. );
    7. }

    Output:

    1. string(1) "e"
    2. string(6) "дом"
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部