dongtu4559 2015-11-03 01:02
浏览 35
已采纳

在mysql中搜索utf8 enocded字符串,显示相同但utf代码不同

I am having a problem when searching the database for utf8 enocded strings in MySQL. I have a kind of a social website with users and they are allowed to add descriptions for their profile and because in my country we use cyrillic alphabet the obvious thing is to use UTF8. I have a search field that searches for the descriptions of the profiles and it is something like this:

SELECT usr.* FROM user AS usr WHERE usr.city = '{$city}' AND usr.desc LIKE '%{$srch}%'

I am using this in PHP by the way and in most of the cases it works. The thing is that some search results can't be searched and I found out that the problem is that some of the users for some reason have the same representations of some letters (so the letter displays exactly the same) but the encoding behind it is not the same. For example the text:

'Оптички стакла' = ÐпÑиÑки ÑÑакла

when encoded and then written in the most common way while using the keyboard language support the most OSes have. But this string of some user:

'Oптички ​​​стaклa' = OпÑиÑки âÑÑaклa

outputs a different code when enocded with UTF8. So because of this the search doesn't work in all the cases and I don't know how to solve it. I think that my database is set properly I tried many combinations and now I am out of ideas. Any help would be appreciated.

Thanks in advance.

  • 写回答

3条回答 默认 最新

  • dougou6727 2016-02-09 11:15
    关注

    I too found out that the case is like @duskwuff said, the problem was that not only one user input this kind of data, but at least it was rare. I managed to find a solution myself. Because in every case this happened on the letters 'A', 'a', 'O', 'o' I just check every letter in the word and if the word is mostly ASCII but an UTF8 is found just convert it like this:

    function convert_ascii_to_utf($str)
    {
            $length = strlen($str);
            $ascii = false;
            $utf8 = false;
            $mixed_encode = false;
    
            //the new string
            $new_str = '';
    
            //check for mixed encoding in the same string
            for($i = 0; $i < $length; $i++)
            {
                if(mb_detect_encoding($str[$i]) == 'ASCII')
                {
                    $ascii = true;
                }
                if(mb_detect_encoding($str[$i]) == 'UTF-8')
                {
                    $utf8 = true;
                }
    
                if($ascii == true && $utf8 == true)
                {
                    $mixed_encode = true;
                    break;
                }
            }
    
            if($mixed_encode)
            {
                for($i = 0; $i < $length; $i++)
                {
                    if($str[$i] == 'a') { $new_str .= 'а'; }
                    else if($str[$i] == 'A') { $new_str .= 'А'; }
                    else if($str[$i] == 'o') { $new_str .= 'о'; }
                    else if($str[$i] == 'O') { $new_str .= 'О'; }
                    else { $new_str .= $str[$i]; }
                }
    
                return $new_str;
            }
            else
            {
                return $str;
            }
        }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算
  • ¥15 powerbuilder中的datawindow数据整合到新的DataWindow
  • ¥20 有人知道这种图怎么画吗?
  • ¥15 pyqt6如何引用qrc文件加载里面的的资源
  • ¥15 安卓JNI项目使用lua上的问题
  • ¥20 RL+GNN解决人员排班问题时梯度消失
  • ¥60 要数控稳压电源测试数据
  • ¥15 能帮我写下这个编程吗
  • ¥15 ikuai客户端l2tp协议链接报终止15信号和无法将p.p.p6转换为我的l2tp线路