dongtan2603 2013-11-06 01:53
浏览 73
已采纳

正则表达式删除少于3个字符的“字符组”

I am trying to remove any 'groups of characters' with less than 3 characters.

This is the source:

1.29 Cancels part plan C/5879 2030. in i i.r e9g6Pop Iatian Area ProcH 22.4.93 Suburban Lands n f 53dv 3 N014 3.5.98. PLAN or any from 01 53 under M R.5I B.L.1laY98 E35. P0 RT I 0 N S At Maroubrajuncti p /I .z. .0 / .L .I. .I

Settings bounds for word characters with repetition between 1 and 3 e.g. /b\w{1,3}\b/ does not work as "C/5879" would become "5879".

The desired output would be as follows:

1.29 Cancels part plan C/5879 2030. e9g6Pop Iatian Area ProcH 22.4.93 Suburban Lands 53dv N014 3.5.98. PLAN from under R.5I B.L.1laY98 E35. Maroubrajuncti

An alternative which could also work would be to create larger 'groups of characters' by joining 'groups of characters' with 2 or less characters delimited by a whitespace.

For example:

1.29 Cancels part plan C/5879 2030. inii.r e9g6Pop Iatian Area ProcH 22.4.93 Suburban Lands nf 53dv 3N014 3.5.98. PLAN orany from 0153 under MR.5I B.L.1laY98 E35. P0RTI0NS AtMaroubrajuncti p/I.z. .0/.L.I..I

I would be open to either solution to rescue me from Regex Hell.

  • 写回答

1条回答 默认 最新

  • duanpanbo9476 2013-11-06 02:13
    关注

    Your definition of "words" is "whitespace delimited", which differ from regex's defitionition of "word to non-word", so use look arounds:

    \s+\S{1,3}(?=\s)
    

    Note that the expression includes (captures) leading spaces, so removing matches will not leave double spaces in the result.

    When tested on regextester result is:

    1.29 Cancels part plan C/5879 2030. e9g6Pop Iatian Area ProcH 22.4.93 Suburban Lands 53dv N014 3.5.98. PLAN from under R.5I B.L.1laY98 E35. Maroubrajuncti .I

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 矩阵加法的规则是两个矩阵中对应位置的数的绝对值进行加和
  • ¥15 活动选择题。最多可以参加几个项目?
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)
  • ¥20 怎么在stm32门禁成品上增加查询记录功能
  • ¥15 Source insight编写代码后使用CCS5.2版本import之后,代码跳到注释行里面
  • ¥50 NT4.0系统 STOP:0X0000007B
  • ¥15 想问一下stata17中这段代码哪里有问题呀