dsunj08246 2014-05-21 12:57
浏览 613
已采纳

只用一个正则表达式去除多个标点符号和空格?

What I got:

array(4) {
  [0]=>
  string(7) "text???"
  [1]=>
  string(7) "???text"
  [2]=>
  string(11) "text???text"
  [3]=>
  string(24) "text ? ? ?    ? ?   text"
}

What I want:

array(4) {
  [0]=>
  string(5) "text?"
  [1]=>
  string(6) "? text"
  [2]=>
  string(10) "text? text"
  [3]=>
  string(10) "text? text"
}

My approach:

<?php

$array = array (
  "text???",
  "???text",
  "text???text",
  "text ? ? ?    ? ?   text"
);

foreach ($array as &$string) {
  $string = preg_replace('!(\s|\?|\!|\.|:|,|;)+!', '$1 ', $string);
}

var_dump($array);

Result:

array(4) {
  [0]=>
  string(6) "text? "
  [1]=>
  string(6) "? text"
  [2]=>
  string(10) "text? text"
  [3]=>
  &string(9) "text text"
}

Conclusion: My approach has two flaws I'm aware of. Firstly, it adds a whitespace behind every replacement even when it's the end of the string. I assume I could use trim after preg_replace, but I'd rather have it removed by regular expression if possible so I don't need to. Secondly it breaks on strings like the last one of the example above for some reason.

  • 写回答

1条回答 默认 最新

  • dongtiannai0654 2014-05-21 13:05
    关注

    Ignoring your last example, text ? ? ? ? ? text, there is a very simple regex that can remove repeating characters in a defined set:

    ([?!.:,;]|\s)\1+
    

    This will match any of the punctuation or whitespace characters that are immediately followed by one or more of the same characters. Used in PHP's preg_replace():

    $value = preg_replace('/([?!.:,;]|\s)\1+/', '$1 ', $value);
    

    Codepad Example of the above.

    Now, this regex won't work for your last example because in your last example the only repeating characters you have are a few spaces; however, if I go off of the assumption that you would be okay with removing any punctuation that follows other punctuation (such as hi!? becoming hi!), we can use the following:

    ([?!.:,;])[?!.:,;\s]+
    

    This regex will find any punctuation mark followed by any number of punctuation or whitespace characters. Used in the preg_replace like above:

    $value = preg_replace('/([?!.:,;])[?!.:,;\s]+/', '$1 ', $value);
    

    Codepad Example of the expanded regex.

    Note: this second regex won't remove repeating whitespace if the whitepsace is the "first" thing, such as in the text text ?text; the reason for this is because, in your example, you have it "use" the first punctuation mark it finds opposed to the first repeating character it finds. If this is a problem, I would recommend a follow-up regex to replace all repeating whitespace:

    $value = preg_replace('/\s\s+/', ' ', $value);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化
  • ¥15 Mirare PLUS 进行密钥认证?(详解)
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
  • ¥20 想用ollama做一个自己的AI数据库
  • ¥15 关于qualoth编辑及缝合服装领子的问题解决方案探寻
  • ¥15 请问怎么才能复现这样的图呀