duan0821 2014-05-21 11:11
浏览 62
已采纳

比较字符串与php中的重音符号

I'm having problems when comparing two strings which contains accents. This is my case:

The first string is: Master The second string is: Máster Diseño Producción

Then, I need to remove the word Máster from the second string, because it's contained in the first string.

I have created a function for clean each string:

function sanear_string($cadena)
{
    $cadena = trim($cadena);

    $cadena = str_replace(
        array('á', 'à', 'ä', 'â', 'ª', 'Á', 'À', 'Â', 'Ä'),
        array('a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A'),
        $cadena
    );

    $cadena = str_replace(
        array('é', 'è', 'ë', 'ê', 'É', 'È', 'Ê', 'Ë'),
        array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E'),
        $cadena
    );

    $cadena = str_replace(
        array('í', 'ì', 'ï', 'î', 'Í', 'Ì', 'Ï', 'Î'),
        array('i', 'i', 'i', 'i', 'I', 'I', 'I', 'I'),
        $cadena
    );

    $cadena = str_replace(
        array('ó', 'ò', 'ö', 'ô', 'Ó', 'Ò', 'Ö', 'Ô'),
        array('o', 'o', 'o', 'o', 'O', 'O', 'O', 'O'),
        $cadena
    );

    $cadena = str_replace(
        array('ú', 'ù', 'ü', 'û', 'Ú', 'Ù', 'Û', 'Ü'),
        array('u', 'u', 'u', 'u', 'U', 'U', 'U', 'U'),
        $cadena
    );

    $cadena = str_replace(
        array('ñ', 'Ñ', 'ç', 'Ç'),
        array('n', 'N', 'c', 'C',),
        $cadena
    );

    //Esta parte se encarga de eliminar cualquier caracter extraño
    $cadena = str_replace(
        array("\\", "¨", "º", "-", "~",
            "#", "@", "|", "!", "\"",
            "·", "$", "%", "&", "/",
            "(", ")", "?", "'", "¡",
            "¿", "[", "^", "`", "]",
            "+", "}", "{", "¨", "´",
            ">", "<", ";", ",", ":",
            ".", " "),
        '',
        $cadena
    );


    return $cadena;
}

And it helps me to the problem of accents. Now I can use strpos to compare both strings...if result is > 0 then I know that the word is contained... but I need some help more.... Thanks in advance,

  • 写回答

3条回答 默认 最新

  • dtng5978 2014-05-21 12:00
    关注

    As usual when dealing with charset problems, you need to be extra careful about the character counts between multibyte strings and plain ASCII strings.

    Your biggest problem here is that you remove some pre-defined characters from the cleaned string, rendering character count coherence between the sanitized string and the original, thus greatly hardening the removal.

    I'll use a modified version of your sanitizing function:

    function sanitize($cadena) {
        $cadena = str_replace(
            array('á', 'à', 'ä', 'â', 'ª', 'Á', 'À', 'Â', 'Ä'),
            array('a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A'),
            $cadena
        );
    
        $cadena = str_replace(
            array('é', 'è', 'ë', 'ê', 'É', 'È', 'Ê', 'Ë'),
            array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E'),
            $cadena
        );
    
        $cadena = str_replace(
            array('í', 'ì', 'ï', 'î', 'Í', 'Ì', 'Ï', 'Î'),
            array('i', 'i', 'i', 'i', 'I', 'I', 'I', 'I'),
            $cadena
        );
    
        $cadena = str_replace(
            array('ó', 'ò', 'ö', 'ô', 'Ó', 'Ò', 'Ö', 'Ô'),
            array('o', 'o', 'o', 'o', 'O', 'O', 'O', 'O'),
            $cadena
        );
    
        $cadena = str_replace(
            array('ú', 'ù', 'ü', 'û', 'Ú', 'Ù', 'Û', 'Ü'),
            array('u', 'u', 'u', 'u', 'U', 'U', 'U', 'U'),
            $cadena
        );
    
        $cadena = str_replace(
            array('ñ', 'Ñ', 'ç', 'Ç'),
            array('n', 'N', 'c', 'C',),
            $cadena
        );
    
    
        return strtolower($cadena);
    }
    

    The remove_word function follows:

    function remove_word($haystack , $needle) {
        // sanitize input strings
        $haystack_san = sanitize($haystack);
        $needle_san = sanitize($needle);
    
        // Check for character loss
        if (mb_strlen($haystack_san, 'UTF-8') != mb_strlen($haystack, 'UTF-8') || mb_strlen($needle_san, 'UTF-8') != mb_strlen($needle, 'UTF-8')) {
            // Here for debugging purposes. You may want to drop it in production.
            echo "Lost some chars on the way. Aborting.
    ";
            echo "     haystack: $haystack (".mb_strlen($haystack, "UTF-8").")
    ";
            echo " haystack_san: $haystack_san (".mb_strlen($haystack_san, "UTF-8").")
    ";
            echo "       needle: $needle (".mb_strlen($needle, "UTF-8").")
    ";
            echo "   needle_san: $needle_san (".mb_strlen($needle_san, "UTF-8").")
    ";
            return;
        }
    
        // Check if $needle is found in $haystack
        if (($pos = strpos($haystack_san, $needle_san)) !== false) {
            // Get the string before the word
            $new = mb_substr($haystack, 0, $pos, 'UTF-8');
            // If applicable, get the string after
            if (mb_strlen($haystack, 'UTF-8') - $pos - mb_strlen($needle, 'UTF-8') > 0)
                $new .= mb_substr($haystack, $pos + mb_strlen($needle), NULL, 'UTF-8');
            // Return it
            return $new;
        }
    
        // If the word wasn't found, return $haystack as-is
        return $haystack;
    }
    
    echo remove_word("Hola, Máster Diseño Producción", "Master");
    // "Hola,  Diseño Producción"
    

    Note that:

    • This assumes your strings are UTF-8
    • The code relies on mb_* function to handle multi-byte characters
    • This only replaces the first occurence of the word (you may call remove_word until the string no longer changes if you want to replace all occurences)
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 逻辑谓词和消解原理的运用
  • ¥15 请求分析基于spring boot+vue的前后端分离的项目
  • ¥15 三菱伺服电机按启动按钮有使能但不动作
  • ¥15 js,页面2返回页面1时定位进入的设备
  • ¥200 关于#c++#的问题,请各位专家解答!网站的邀请码
  • ¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。
  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝
  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?