比较字符串与php中的重音符号

I'm having problems when comparing two strings which contains accents. This is my case:

The first string is: Master The second string is: Máster Diseño Producción

Then, I need to remove the word Máster from the second string, because it's contained in the first string.

I have created a function for clean each string:

function sanear_string($cadena)
{
    $cadena = trim($cadena);

    $cadena = str_replace(
        array('á', 'à', 'ä', 'â', 'ª', 'Á', 'À', 'Â', 'Ä'),
        array('a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A'),
        $cadena
    );

    $cadena = str_replace(
        array('é', 'è', 'ë', 'ê', 'É', 'È', 'Ê', 'Ë'),
        array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E'),
        $cadena
    );

    $cadena = str_replace(
        array('í', 'ì', 'ï', 'î', 'Í', 'Ì', 'Ï', 'Î'),
        array('i', 'i', 'i', 'i', 'I', 'I', 'I', 'I'),
        $cadena
    );

    $cadena = str_replace(
        array('ó', 'ò', 'ö', 'ô', 'Ó', 'Ò', 'Ö', 'Ô'),
        array('o', 'o', 'o', 'o', 'O', 'O', 'O', 'O'),
        $cadena
    );

    $cadena = str_replace(
        array('ú', 'ù', 'ü', 'û', 'Ú', 'Ù', 'Û', 'Ü'),
        array('u', 'u', 'u', 'u', 'U', 'U', 'U', 'U'),
        $cadena
    );

    $cadena = str_replace(
        array('ñ', 'Ñ', 'ç', 'Ç'),
        array('n', 'N', 'c', 'C',),
        $cadena
    );

    //Esta parte se encarga de eliminar cualquier caracter extraño
    $cadena = str_replace(
        array("\\", "¨", "º", "-", "~",
            "#", "@", "|", "!", "\"",
            "·", "$", "%", "&", "/",
            "(", ")", "?", "'", "¡",
            "¿", "[", "^", "`", "]",
            "+", "}", "{", "¨", "´",
            ">", "<", ";", ",", ":",
            ".", " "),
        '',
        $cadena
    );


    return $cadena;
}

And it helps me to the problem of accents. Now I can use strpos to compare both strings...if result is > 0 then I know that the word is contained... but I need some help more.... Thanks in advance,

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

dtng5978 2014-05-21 12:00

关注

As usual when dealing with charset problems, you need to be extra careful about the character counts between multibyte strings and plain ASCII strings.

Your biggest problem here is that you remove some pre-defined characters from the cleaned string, rendering character count coherence between the sanitized string and the original, thus greatly hardening the removal.

I'll use a modified version of your sanitizing function:

function sanitize($cadena) {
    $cadena = str_replace(
        array('á', 'à', 'ä', 'â', 'ª', 'Á', 'À', 'Â', 'Ä'),
        array('a', 'a', 'a', 'a', 'a', 'A', 'A', 'A', 'A'),
        $cadena
    );

    $cadena = str_replace(
        array('é', 'è', 'ë', 'ê', 'É', 'È', 'Ê', 'Ë'),
        array('e', 'e', 'e', 'e', 'E', 'E', 'E', 'E'),
        $cadena
    );

    $cadena = str_replace(
        array('í', 'ì', 'ï', 'î', 'Í', 'Ì', 'Ï', 'Î'),
        array('i', 'i', 'i', 'i', 'I', 'I', 'I', 'I'),
        $cadena
    );

    $cadena = str_replace(
        array('ó', 'ò', 'ö', 'ô', 'Ó', 'Ò', 'Ö', 'Ô'),
        array('o', 'o', 'o', 'o', 'O', 'O', 'O', 'O'),
        $cadena
    );

    $cadena = str_replace(
        array('ú', 'ù', 'ü', 'û', 'Ú', 'Ù', 'Û', 'Ü'),
        array('u', 'u', 'u', 'u', 'U', 'U', 'U', 'U'),
        $cadena
    );

    $cadena = str_replace(
        array('ñ', 'Ñ', 'ç', 'Ç'),
        array('n', 'N', 'c', 'C',),
        $cadena
    );


    return strtolower($cadena);
}

The remove_word function follows:

function remove_word($haystack , $needle) {
    // sanitize input strings
    $haystack_san = sanitize($haystack);
    $needle_san = sanitize($needle);

    // Check for character loss
    if (mb_strlen($haystack_san, 'UTF-8') != mb_strlen($haystack, 'UTF-8') || mb_strlen($needle_san, 'UTF-8') != mb_strlen($needle, 'UTF-8')) {
        // Here for debugging purposes. You may want to drop it in production.
        echo "Lost some chars on the way. Aborting.
";
        echo "     haystack: $haystack (".mb_strlen($haystack, "UTF-8").")
";
        echo " haystack_san: $haystack_san (".mb_strlen($haystack_san, "UTF-8").")
";
        echo "       needle: $needle (".mb_strlen($needle, "UTF-8").")
";
        echo "   needle_san: $needle_san (".mb_strlen($needle_san, "UTF-8").")
";
        return;
    }

    // Check if $needle is found in $haystack
    if (($pos = strpos($haystack_san, $needle_san)) !== false) {
        // Get the string before the word
        $new = mb_substr($haystack, 0, $pos, 'UTF-8');
        // If applicable, get the string after
        if (mb_strlen($haystack, 'UTF-8') - $pos - mb_strlen($needle, 'UTF-8') > 0)
            $new .= mb_substr($haystack, $pos + mb_strlen($needle), NULL, 'UTF-8');
        // Return it
        return $new;
    }

    // If the word wasn't found, return $haystack as-is
    return $haystack;
}

echo remove_word("Hola, Máster Diseño Producción", "Master");
// "Hola,  Diseño Producción"

Note that:

This assumes your strings are UTF-8
The code relies on mb_* function to handle multi-byte characters
This only replaces the first occurence of the word (you may call remove_word until the string no longer changes if you want to replace all occurences)

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(2条)

报告相同问题？

关注问题

php 去除字符串中符号,如何从PHP字符串中的字符中删除重音符号？
2021-04-22 17:47

weixin_39621060的博客什么WordPress的实现？function remove_accents($string) {if ( !preg_match('/[\x80-\xff]/', $string) )return $string;$chars = array(// Decompositions for Latin-1 Supplementchr(195).chr(128) =>...
9、PHP字符串处理全解析
2025-07-23 16:16

嗑着瓜子听你唠嗑的博客本文深入解析了PHP中字符串处理的各种技巧和方法，涵盖了输出函数的使用与错误处理、访问单个字符、清理字符串、编码与转义等多个方面。详细介绍了如`echo`、`print`、`printf`、`sprintf`等输出函数的使用及注意...
php正则表达式重复字符,php正则表达式匹配可能的重音字符
2021-04-08 10:40

地理沙龙的博客情况：我想用类似“blablebli”的字符串搜索字符串,并且能够在文本中找到与所有可能的重音变体(“blablebli”,“blábleblí”,“blâblèbli”等等)的匹配.我已经做了相反的解决方法(找到一个没有我写的可能的重音...
PHP-Custom-String-Functions:一组 PHP 函数，可以更好地处理重音 utf-8 字符
2021-07-03 17:09

在PHP编程语言中，处理字符串是一项常见的任务，尤其是在处理国际化和多语言内容时。UTF-8编码是一种广泛使用的字符...在实践中，结合这些函数与PHP内置的字符串处理功能，可以构建出更加健壮和灵活的字符串操作逻辑。
php传递字符串变量到javascript的函数参数,javascript - 在onclick函数中传递字符串参数...
2021-04-29 07:12

weixin_39984952的博客 javascript - 在onclick函数中传递字符串参数我想将参数(即字符串... 由于这个函数调用与数字参数完美配合，我认为它与字符串中的符号“”有关。以前有没有人遇到这个问题？Consec asked 2019-04-07T20:00:34Z19个解...
slor 搜索引擎不带重音_带重音字符的字符串排序
2020-08-02 00:13

culuo8053的博客 slor 搜索引擎不带重音Stringscan create a whole host of problems within any programming language. Whether it's a simple string, a string containing emojis, html entities, and even accented characters, ...
php byte转宽字符,php 中的宽字符处理
2021-04-27 01:48

赵猪倌的博客编码问题简述ASCII编码，ASCII(American Standard Code for Information Interchange)，是一种字符编码标准，它的字符集为英文字符集，它规定字符集中的每个字符均由一个字节表示，指定了字符表编码表，称为ASCII...
8、PHP 字符串与正则表达式全解析
2025-12-03 10:30

moon9的博客本文全面解析了PHP中字符串处理与正则表达式的使用方法，涵盖复杂偏移语法、POSIX和Perl风格的正则表达式语法及其相关函数（如ereg、preg_match、preg_replace等），并介绍了常用字符串操作函数（如strlen、strpos、...
php 提示宽输出,php 中的宽字符处理
2021-04-13 14:46

Books.Fan的博客编码问题简述ASCII编码，ASCII(American Standard Code for Information Interchange)，是一种字符编码标准，它的字符集为英文字符集，它规定字符集中的每个字符均由一个字节表示，指定了字符表编码表，称为ASCII...
003-字符串处理模块
2025-08-03 22:59

lvjesus的博客这篇文章摘要介绍了Python中string模块的核心功能和应用场景： 字符串常量：展示了string模块提供的字符集常量（字母、数字、标点等）及其在密码生成、文本分析中的实际应用。 Template类：详细讲解了字符串模板的...
没有解决我的问题, 去提问

码龄粉丝数原力等级 --

比较字符串与php中的重音符号

3条回答默认最新

码龄粉丝数原力等级 --

比较字符串与php中的重音符号

3条回答 默认 最新

3条回答默认最新