douwen9343 2018-09-24 19:22
浏览 107
已采纳

preg_replace不会从字符串中删除所有空格字符

I've got the following code, which should be comparing 2 strings after stripping all the whitespace, here is a simplified version of the function:

function not_same($type, $org_str1, $str2) {

    $str1 = preg_replace('/\s+/', '', $org_str1);
    $str2 = preg_replace('/\s+/', '', $str2);

    $tries = [];
    $tries[] = ["str1" => $str1, "str2" => $str2, "encoded1" => urlencode($str1), "encoded2" => urlencode($str2)];        

    if($str1 == $str2) {
        return true;
    } else {
        return false;
    }

}

I'm using this to check on a computer if the processor is the same as a matched model in my database, so $org_str1 is what my client says the computer it is running on has, and $str2 is the cpu in my database that the model should have.

Sometimes these strings have unneeded spaces, so during comparison I remove all of the whitspace so the text itself is compared.

Now I am getting computers back saying that the CPU is wrong, because the match is not made, because there is some whitespace that is not removed.

In this specific case, I'm trying to compare the string Client: Celeron® N3050 vs Server: Celeron® N3050. I'm logging each time what is actually being compared on my server, on my client it says it is comparing Client: Celeron® N3050 vs Server: Celeron®N3050

I tried copying and pasting this whitespace into a str_replace() function, but it did not solve the issue. After that, I got the idea of logging the string with urlencode(), this allows me to see exactly what this mysterious white character is, but I still am at a loss on how to fix the issue.

The strings after urlencode() are Client: Celeron%C2%AE%C2%A0N3050 vs Server: Celeron%C2%AEN3050

As you can see, there is still a whitespace character in my client string, encoded to %C2%A0. Why does preg_replace not get rid of this whitespace, and how can I programmatically remove it?

  • 写回答

1条回答 默认 最新

  • dongwen7423 2018-09-24 19:34
    关注

    \xC2\xA0 is a unicode non-breaking space. Add the u modifier to your regex.

    $raw = urldecode('Celeron%C2%AE%C2%A0N3050');
    
    var_dump(
        preg_replace('/\s+/', '', $raw),
        preg_replace('/\s+/u', '', $raw),
        urlencode($raw),
        urlencode(preg_replace('/\s+/u', '', $raw))
    );
    

    Output:

    string(16) "Celeron® N3050"
    string(14) "Celeron®N3050"
    string(24) "Celeron%C2%AE%C2%A0N3050"
    string(18) "Celeron%C2%AEN3050"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 labview程序设计
  • ¥15 为什么在配置Linux系统的时候执行脚本总是出现E: Failed to fetch http:L/cn.archive.ubuntu.com
  • ¥15 Cloudreve保存用户组存储空间大小时报错
  • ¥15 伪标签为什么不能作为弱监督语义分割的结果?
  • ¥15 编一个判断一个区间范围内的数字的个位数的立方和是否等于其本身的程序在输入第1组数据后卡住了(语言-c语言)
  • ¥15 游戏盾如何溯源服务器真实ip?
  • ¥15 Mac版Fiddler Everywhere4.0.1提示强制更新
  • ¥15 android 集成sentry上报时报错。
  • ¥50 win10链接MySQL
  • ¥15 抖音看过的视频,缓存在哪个文件