I need to replicate the behavior of MySQL utf8_general_ci
collation in PHP. Strictly speaking I need to detect what whould be considered different and what would be considered the same. The case independent part is easy. The problem is utf_general_ci
considers characters with diacritics and characters without diacritics to be equal: e = è = é etc.. To replicate that comparison, I'd need to have a way to replace è -> e, é -> e.
The method that comes to my mind is:
echo iconv("utf-8", "ascii//TRANSLIT", "é");
One problem is iconv
behaves differently depending on current locale and that's asking for a problem.
The other problem is the input may also contain Cirillic letters that shouldn't be stripped or result in a PHP Notice.
echo iconv("utf-8", "ascii//TRANSLIT", "дом");
Is there a solution or do I have to create manually mapping of each character with diacritic to a one without it?