dongxue7306 2010-05-18 09:16
浏览 32
已采纳

需要正则表达式来匹配特殊情况

I'm desperately searching for regular expressions that match these scenarios:

1) Match alternating chars

I've a string like "This is my foobababababaf string" - and I want to match "babababa"

Only thing I know is the length of the fragment to search - I don't know what chars/digits that might be - but they are alternating.

I've really no clue where to start :(

2) Match combined groups

In a string like "This is my foobaafoobaaaooo string" - and I want to match "aaaooo". Like in 1) I don't know what chars/digits that might be. I only know that they will appear in two groups.

I experimented using (.)\1\1\1(.)\1\1\1 and things like this...

  • 写回答

4条回答 默认 最新

  • doukong9316 2010-05-18 14:14
    关注

    I think something like this is what you want.

    For alternating characters:

    (?=(.)(?!\1)(.))(?:\1\2){2,}
    

    \0 will be the entire alternating sequence, \1 and \2 are the two (distinct) alternating characters.

    For run of N and M characters, possibly separated by other characters (replace N and M with numbers here):

    (?=(.))\1{N}.*?(?=(?!\1)(.))\2{M}
    

    \0 will be entire match, including infix. \1 is the character repeated (at least) N times, \2 is the character repeated (at least) M times.

    Here's a test harness in Java.

    import java.util.regex.*;
    
    public class Regex3 {
        static String runNrunM(int N, int M) {
            return "(?=(.))\\1{N}.*?(?=(?!\\1)(.))\\2{M}"
                .replace("N", String.valueOf(N))
                .replace("M", String.valueOf(M));
        }
        static void dumpMatches(String text, String pattern) {
            Matcher m = Pattern.compile(pattern).matcher(text);
            System.out.println(text + " <- " + pattern);
            while (m.find()) {
                System.out.println("  match");
                for (int g = 0; g <= m.groupCount(); g++) {
                    System.out.format("    %d: [%s]%n", g, m.group(g));
                }
            }
        }
        public static void main(String[] args) {
            String[] tests = {
                "foobababababaf foobaafoobaaaooo",
                "xxyyyy axxayyyya zzzzzzzzzzzzzz"
            };
            for (String test : tests) {
                dumpMatches(test, "(?=(.)(?!\\1)(.))(?:\\1\\2){2,}");
            }
            for (String test : tests) {
                dumpMatches(test, runNrunM(3, 3));
            }
            for (String test : tests) {
                dumpMatches(test, runNrunM(2, 4));
            }
        }
    }
    

    This produces the following output:

    foobababababaf foobaafoobaaaooo <- (?=(.)(?!\1)(.))(?:\1\2){2,}
      match
        0: [bababababa]
        1: [b]
        2: [a]
    xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.)(?!\1)(.))(?:\1\2){2,}
    foobababababaf foobaafoobaaaooo <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
      match
        0: [aaaooo]
        1: [a]
        2: [o]
    xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{3}.*?(?=(?!\1)(.))\2{3}
      match
        0: [yyyy axxayyyya zzz]
        1: [y]
        2: [z]
    foobababababaf foobaafoobaaaooo <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
    xxyyyy axxayyyya zzzzzzzzzzzzzz <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{4}
      match
        0: [xxyyyy]
        1: [x]
        2: [y]
      match
        0: [xxayyyy]
        1: [x]
        2: [y]
    

    Explanation

    • (?=(.)(?!\1)(.))(?:\1\2){2,} has two parts
      • (?=(.)(?!\1)(.)) establishes \1 and \2 using lookahead
        • Nested negative lookahead ensures that \1 != \2
        • Using lookahead to capture lets \0 have the entire match (instead of just the "tail" end)
      • (?:\1\2){2,} captures the \1\2 sequence, which must repeat at least twice.
    • (?=(.))\1{N}.*?(?=(?!\1)(.))\2{M} has three parts
      • (?=(.))\1{N} captures \1 in a lookahead, and then match it N times
        • Using lookahead to capture means the repetition can be N instead of N-1
      • .*? allows an infix to separate the two runs, reluctant to keep it as short as possible
      • (?=(?!\1)(.))\2{M}
        • Similar to first part
        • Nested negative lookahead ensures that \1 != \2

    The run regex will match longer runs, e.g. run(2,2) matches "xxxyyy":

    xxxyyy <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{2}
      match
        0: [xxxyy]
        1: [x]
        2: [y]
    

    Also, it does not allow overlapping matches. That is, there is only one run(2,3) in "xx11yyy222".

    xx11yyy222 <- (?=(.))\1{2}.*?(?=(?!\1)(.))\2{3}
      match
        0: [xx11yyy]
        1: [x]
        2: [y]
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥30 BC260Y用MQTT向阿里云发布主题消息一直错误
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM
  • ¥15 划分vlan后不通了
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)