如何从PHP中的UTF8字符“删除变音符号”？

I need to replicate the behavior of MySQL utf8_general_ci collation in PHP. Strictly speaking I need to detect what whould be considered different and what would be considered the same. The case independent part is easy. The problem is utf_general_ci considers characters with diacritics and characters without diacritics to be equal: e = è = é etc.. To replicate that comparison, I'd need to have a way to replace è -> e, é -> e.

The method that comes to my mind is:

echo iconv("utf-8", "ascii//TRANSLIT", "é");

One problem is iconv behaves differently depending on current locale and that's asking for a problem.

The other problem is the input may also contain Cirillic letters that shouldn't be stripped or result in a PHP Notice.

echo iconv("utf-8", "ascii//TRANSLIT", "дом");

Is there a solution or do I have to create manually mapping of each character with diacritic to a one without it?

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doulu4413 2018-02-02 11:07
关注
intl's Transliterator will let you define far more in-depth transliteration rules. The full documentation on transliteration rules can be found on icu-project.org.

$tests = [ "é", "дом" ]; $tl = Transliterator::create('Latin-ASCII;'); foreach($tests as $str) { var_dump( $tl->transliterate($str) ); }

Output:

string(1) "e" string(6) "дом"
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

编辑

预览

报告相同问题？

关注问题

如何在PHP中使用COM对象获取UTF-8字符串？ php
2017-11-06 04:40

回答 1 已采纳 Well, answer was right in front of my eyes, I just overlooked: COM::__construct ( string $module_
PHP + PostgreSQL + ODBC - UTF8 - 变音符问题 php postgresql
2019-08-12 00:52

回答 1 已采纳 Its not logic in my case(DB is not in Win1250), but its function. Diacritic is OK. $invoice_item[
如何删除＆nbsp; 从UTF-8字符串？ php
2015-02-24 11:29

回答 2 已采纳 This gets tricky, its not as straight forward as replacing normal string. Try this. str_repla
从JavaScript看字符编码的前世今生！
2022-05-22 03:30

架构师小秘圈的博客导语|每个程序员都应该了解一下字符编码，有了基础概念之后我们对编程语言、字符处理能有更深入的理解。本文我花了大量时间进行资料查阅和考证，希望能够给大家带来一些帮助，多多交流！一、起因最近在研究Babel的...
PHP输出编码与MySQL数据库中的UTF-8字符串有关 mysql php
2015-06-22 16:47

回答 3 已采纳 You should not change all the character_set% fields, just the three that are affected by SET NAMES
python3 \x开头的utf8字符串怎么转成中文字符串？？ python 有问必答
2022-01-15 10:24

回答 2 已采纳 from urllib import parse utf8_str = '\xe4\xb8\xad\xe6\x96\x87' # 转成 ’中文‘ print(utf8_str)
PHP输出utf-8字符的问题 apache php
2017-01-27 02:35

回答 2 已采纳 You should check the character encoding of the xlsx file. If the file was created on windows then
M5StickC语音助理玩转语音识别
2023-06-15 13:47

peien268的博客中文显示我们希望把语音识别到的文字显示到屏幕上，M5官方提供了显示GBK编码字符串的方法loadHzk16，但Arduino IDE采用的是UTF8编码，故我们需要把UTF8编码的字符串转换为GBK编码，这里我们用到了齐护机器人的QDP_...
如何删除所有垃圾字符，但在PHP中通过正则表达式保留货币符号？ php
2014-03-12 01:00

回答 3 已采纳 You need to add the u modifier to the RegEx and it works nicely: $output = 'Clean this copy of in
Go vs Swift中的UTF8字符串长度和索引 swift
2018-05-02 06:24

回答 2 已采纳 In Swift a Character is an “extended grapheme cluster,” and each of "
PHP - 字节到UTF8字符？ php
2013-01-26 13:58

回答 1 已采纳 As Gumbo pointed out, the characters are encoded as a quoted-printable string. To decode, use this
从零开始学习Linux笔记
2020-05-15 11:12

祢听的到丶的博客从零开始学习Linux，记录笔记，担心自己以后会忘，也供大家茶余饭后，闲来无事看看，自己的理解只能到这，也希望大家可以指出我的错误让我可以有一点点进步，以后会一直更新
大数据学习总结（2021版）---Mysql基础
2021-03-04 09:10

亿钱君的博客这里写目录标题第一章：数据库1.1 数据库概述1.2 ...database（创建、查看、删除、切换）3.4 表结构相关语句3.4.1 创建表3.4.2 查看表3.4.3 删除表3.4.4 修改表结构格式第四章：字段属性4.1 主键4.1.1增加主键4.1.2 主
springcloud中文手册API
2018-10-12 15:28

浮生梦浮生的博客开箱即用，负责从外部源加载配置属性，还解密本地外部配置文件中的属性。这两个上下文共享一个 Environment ，这是任何Spring应用程序的外部属性的来源。Bootstrap属性的优先级高，因此默认情况下不能被本地配置覆盖...
PHP面试题(一)
2018-03-24 03:56

钟长森的博客双端队列中的元素可以从两端弹出，其限定插入和删除操作在表的两端进行。双向队列（双端队列）就像是一个队列，但是你可以在任何一端添加或移除元素。而双端队列是一种数据结构，定义如下： A deque is a data ...
没有解决我的问题, 去提问

如何从PHP中的UTF8字符“删除变音符号”？

2条回答 默认 最新

2条回答默认最新