dongxiaoshe0737 2014-05-12 18:54
浏览 30
已采纳

处理递归的正则表达式模式构造

I'm not sure if recursion is the correct way to characterize what's occurring in this pattern, but unfortunately I'm too new with regex to build something that will conform to how this pattern can vary and avoid nested groups.

So the pattern is basically defined as:

@param {item} {label}:{text} {labeln}:{textn}

where labeln and textn is some N instance of the label:text group.

So an example would be

/**
 *
 * @param name1 test1:this is text for test1 test2:this is text for test2
 * @param name2 test3:this is text for test3 test4:this is text for test4 test5:this is text for test5
 *
 * /

Now ideally I'm trying to capture name1, test1:this is text for test1, and test2:this is text for test2 as matching groups. Same goes for the name2 line. Of course there can be many more examples of name1 and the psuedo "named parameters" can be varied, from none to many. +Edit: Colons would not be permitted within the label text since they're reserved as delimiters. Label is strictly alphanumeric, label would probably be restricted to a-zA-Z0-9_,'"-

First question is... is this a recursion problem or did I mischaracterize this?

Second question is... is it possible and if so, how can I achieve this?

  • 写回答

2条回答 默认 最新

  • doumei9589 2014-05-12 19:19
    关注

    Preface:

    For the sake of explanation, I decided to clarify your "labels" by preceding them with a %. This can be any reserved symbol or other pattern that helps clear up what is a label/text:

    /**
     * @param variable_a %label:This is variable: a %required:true
     * @param variable_b %required:false %pattern:/[a-zA-Z:]/
     */
    

    Problem:

    The problem with capturing repetitive patterns in regular expressions is you can't have an unknown amount of capture groups (i.e. you either need to match a global number of matches or capture a specific amount of groups in each match):

    @param    (?# find a param)
    \s*       (?# whitespace)
    (\w+)     (?# capture the variable)
    \s*       (?# whitespace)
    (?:       (?# start non capturing group)
    %(\w+):   (?# capture the label)
    ([^%
    ]+) (?# capture the text)
    )+        (?# repeat the non-capturing group)
    

    In this example, I put the label/text capturing code in a non-capturing and repeated (1+ times) group. This allows us to match the whole string, however only the last set of labels/texts are captured (since we only have 3 groups: variable, label, and text).


    Straightforward Solution:

    Instead of this, you can just match the whole string and then parse the label/text string after-the-fact:

    (?# match the whole string)
    @param    (?# find a param)
    \s*       (?# whitespace)
    (\w+)     (?# capture the variable)
    \s*       (?# whitespace)
    (.*)      (?# capture the labels/texts)
    
    (?# parse the label/text string)
    %         (?# the start of a label)
    (\w+)     (?# capture label)
    :         (?# end of label)
    ([^%]+)   (?# capture text)
    

    Awesome Solution:

    Finally, we can use some regular expression magic to do a global match of all label/text combinations. This means we will have a defined set of 3 capture groups (variable, label, text) and we'll have a variable amount of matches. I think this one is best to show and then explain, so here is the crazy awesome regex magic:

    (?:       (?# start non-capturing group)
      @param  (?# find a param)
      \s*     (?# whitespace)
      (\w+)   (?# capture the variable)
      \s*     (?# whitespace)
     |        (?# OR)
      \G      (?# start back over from our last match)
    )         (?# end non-capturing group)
    %(\w+):   (?# capture the label)
    ([^%
    ]+) (?# capture the text)
    

    This one revolves around the PCRE magic of \G, which matches the end of the last match. So we start a non-capturing group that will contain the "prefix" of a @param definition. This will either match and capture the variable OR start over from the end of our last match. Then we match/capture 1 label/text group. Next time it is repeated, we will start where we left off, the variable capture group will be blank (since it doesn't exist that far into the string, you'll have to use logic to know which variable you are on), and capture another label/text group (until we hit a new line, since I said a text can't be % or ). Then the next match attempt will find a new variable defined by @param. I think this will be your best option, it just takes some more logic on your end.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥30 关于用python写支付宝扫码付异步通知收不到的问题
  • ¥50 vue组件中无法正确接收并处理axios请求
  • ¥15 隐藏系统界面pdf的打印、下载按钮
  • ¥15 MATLAB联合adams仿真卡死如何解决(代码模型无问题)
  • ¥15 基于pso参数优化的LightGBM分类模型
  • ¥15 安装Paddleocr时报错无法解决
  • ¥15 python中transformers可以正常下载,但是没有办法使用pipeline
  • ¥50 分布式追踪trace异常问题
  • ¥15 人在外地出差,速帮一点点
  • ¥15 如何使用canvas在图片上进行如下的标注,以下代码不起作用,如何修改