dongyue0225 2017-10-24 17:43
浏览 31
已采纳

正则表达式匹配行分隔的大小字符串

I am writing a reular expression to validate input string, which is a line separated list of sizes ([width]x[height]).

Valid input example:

300x200
50x80


100x100

The regular expression I initially came up with is (https://regex101.com/r/H9JDjA/1):

^(\d+x\d+[
||
]*)+$

This regular expression matches my input but also matches this invalid input (size can't be 100x100x200):

300x200
50x80
100x100x200

Adding a word boundary at the end seems to have fixed this issue:

^(\d+x\d+[
||
]*\b)+$

My questions:

  1. Why does the initial regular expression without the word boundary fail? It looks like I am matching one or more instances of a \d+(number), followed by character 'x', followed by a \d+(number), followed by one or more new lines from various operating systems.
  2. How to validate input having multiple training new line characters in this input? The following doesn't work for some kind of inputs like this:

    500x500 100x100 384384

    ^(\d+x\d+[ || ]\b)+|[ || ]$

  • 写回答

4条回答 默认 最新

  • douye2020 2017-10-24 19:25
    关注

    Isolate the problem with this target 100x100x200

    For now, forget about the anchors in the regex.

    The minimum regex is \d+x\d+ since it only has to be satisfied once
    for a match to take place.

    The maximum is something like this \d+x\d+ (?: (?:? | )* \d+x\d+ )*

    Since ? | is optional, it can be reduced to this \d+x\d+ (?: \d+x\d+ )*

    The result, when you applied to the target string is:

    100x100x200 matches.

    But, since you've anchored the regex ^$, it is forced to break up
    the middle 100 to make it match.

    100x10 from \d+x\d+
    0x200 from (?: \d+x\d+ )*

    So, that is why the first regex seemingly matches 100x100x200.

    To avoid all of that, just require a line break between them, and
    make the trailing linebreaks optional (if you need to validate the whole
    string, otherwise leave it and the end anchor off).

    ^\d+x\d+(?:(?:? |)+\d+x\d+)*(?:? |)*$

    A better view of it

     ^ 
     \d+ x \d+ 
     (?:
          (?: ? 
     |  )+
          \d+ x \d+ 
     )*
     (?: ? 
     |  )*
     $
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 为什么我运行这个网络会出现以下报错?CRNN神经网络
  • ¥20 steam下载游戏占用内存
  • ¥15 CST保存项目时失败
  • ¥15 树莓派5怎么用camera module 3啊
  • ¥20 java在应用程序里获取不到扬声器设备
  • ¥15 echarts动画效果的问题,请帮我添加一个动画。不要机器人回答。
  • ¥15 Attention is all you need 的代码运行
  • ¥15 一个服务器已经有一个系统了如果用usb再装一个系统,原来的系统会被覆盖掉吗
  • ¥15 使用esm_msa1_t12_100M_UR50S蛋白质语言模型进行零样本预测时,终端显示出了sequence handled的进度条,但是并不出结果就自动终止回到命令提示行了是怎么回事:
  • ¥15 前置放大电路与功率放大电路相连放大倍数出现问题