dongtu4559 2015-11-03 01:02
浏览 35
已采纳

在mysql中搜索utf8 enocded字符串,显示相同但utf代码不同

I am having a problem when searching the database for utf8 enocded strings in MySQL. I have a kind of a social website with users and they are allowed to add descriptions for their profile and because in my country we use cyrillic alphabet the obvious thing is to use UTF8. I have a search field that searches for the descriptions of the profiles and it is something like this:

SELECT usr.* FROM user AS usr WHERE usr.city = '{$city}' AND usr.desc LIKE '%{$srch}%'

I am using this in PHP by the way and in most of the cases it works. The thing is that some search results can't be searched and I found out that the problem is that some of the users for some reason have the same representations of some letters (so the letter displays exactly the same) but the encoding behind it is not the same. For example the text:

'Оптички стакла' = ÐпÑиÑки ÑÑакла

when encoded and then written in the most common way while using the keyboard language support the most OSes have. But this string of some user:

'Oптички ​​​стaклa' = OпÑиÑки âÑÑaклa

outputs a different code when enocded with UTF8. So because of this the search doesn't work in all the cases and I don't know how to solve it. I think that my database is set properly I tried many combinations and now I am out of ideas. Any help would be appreciated.

Thanks in advance.

  • 写回答

3条回答 默认 最新

  • dougou6727 2016-02-09 11:15
    关注

    I too found out that the case is like @duskwuff said, the problem was that not only one user input this kind of data, but at least it was rare. I managed to find a solution myself. Because in every case this happened on the letters 'A', 'a', 'O', 'o' I just check every letter in the word and if the word is mostly ASCII but an UTF8 is found just convert it like this:

    function convert_ascii_to_utf($str)
    {
            $length = strlen($str);
            $ascii = false;
            $utf8 = false;
            $mixed_encode = false;
    
            //the new string
            $new_str = '';
    
            //check for mixed encoding in the same string
            for($i = 0; $i < $length; $i++)
            {
                if(mb_detect_encoding($str[$i]) == 'ASCII')
                {
                    $ascii = true;
                }
                if(mb_detect_encoding($str[$i]) == 'UTF-8')
                {
                    $utf8 = true;
                }
    
                if($ascii == true && $utf8 == true)
                {
                    $mixed_encode = true;
                    break;
                }
            }
    
            if($mixed_encode)
            {
                for($i = 0; $i < $length; $i++)
                {
                    if($str[$i] == 'a') { $new_str .= 'а'; }
                    else if($str[$i] == 'A') { $new_str .= 'А'; }
                    else if($str[$i] == 'o') { $new_str .= 'о'; }
                    else if($str[$i] == 'O') { $new_str .= 'О'; }
                    else { $new_str .= $str[$i]; }
                }
    
                return $new_str;
            }
            else
            {
                return $str;
            }
        }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘
  • ¥15 matlab有关常微分方程的问题求解决
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿
  • ¥15 回答4f系统的像差计算
  • ¥15 java如何提取出pdf里的文字?
  • ¥100 求三轴之间相互配合画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考