dsunj08246 2014-05-21 12:57
浏览 613
已采纳

只用一个正则表达式去除多个标点符号和空格?

What I got:

array(4) {
  [0]=>
  string(7) "text???"
  [1]=>
  string(7) "???text"
  [2]=>
  string(11) "text???text"
  [3]=>
  string(24) "text ? ? ?    ? ?   text"
}

What I want:

array(4) {
  [0]=>
  string(5) "text?"
  [1]=>
  string(6) "? text"
  [2]=>
  string(10) "text? text"
  [3]=>
  string(10) "text? text"
}

My approach:

<?php

$array = array (
  "text???",
  "???text",
  "text???text",
  "text ? ? ?    ? ?   text"
);

foreach ($array as &$string) {
  $string = preg_replace('!(\s|\?|\!|\.|:|,|;)+!', '$1 ', $string);
}

var_dump($array);

Result:

array(4) {
  [0]=>
  string(6) "text? "
  [1]=>
  string(6) "? text"
  [2]=>
  string(10) "text? text"
  [3]=>
  &string(9) "text text"
}

Conclusion: My approach has two flaws I'm aware of. Firstly, it adds a whitespace behind every replacement even when it's the end of the string. I assume I could use trim after preg_replace, but I'd rather have it removed by regular expression if possible so I don't need to. Secondly it breaks on strings like the last one of the example above for some reason.

  • 写回答

1条回答 默认 最新

  • dongtiannai0654 2014-05-21 13:05
    关注

    Ignoring your last example, text ? ? ? ? ? text, there is a very simple regex that can remove repeating characters in a defined set:

    ([?!.:,;]|\s)\1+
    

    This will match any of the punctuation or whitespace characters that are immediately followed by one or more of the same characters. Used in PHP's preg_replace():

    $value = preg_replace('/([?!.:,;]|\s)\1+/', '$1 ', $value);
    

    Codepad Example of the above.

    Now, this regex won't work for your last example because in your last example the only repeating characters you have are a few spaces; however, if I go off of the assumption that you would be okay with removing any punctuation that follows other punctuation (such as hi!? becoming hi!), we can use the following:

    ([?!.:,;])[?!.:,;\s]+
    

    This regex will find any punctuation mark followed by any number of punctuation or whitespace characters. Used in the preg_replace like above:

    $value = preg_replace('/([?!.:,;])[?!.:,;\s]+/', '$1 ', $value);
    

    Codepad Example of the expanded regex.

    Note: this second regex won't remove repeating whitespace if the whitepsace is the "first" thing, such as in the text text ?text; the reason for this is because, in your example, you have it "use" the first punctuation mark it finds opposed to the first repeating character it finds. If this is a problem, I would recommend a follow-up regex to replace all repeating whitespace:

    $value = preg_replace('/\s\s+/', ' ', $value);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器