dqvy87517
2016-07-25 06:40
浏览 399
已采纳

Golang正则表达式以匹配关键字对之间的多种模式

I have a string which has two keywords: "CURRENT NAME(S)" and "NEW NAME(S)" and each of these keywords are followed by a bunch of words. I want to extract those set of words beyond each of these keywords. To elaborate with a code:

    s := `"CURRENT NAME(S)
 Name1, Name2",,"NEW NAME(S)
NewName1,NewName2"`
    re := regexp.MustCompile(`"CURRENT NAME(S).*",,"NEW NAME(S).*"`)

    segs := re.FindAllString(s, -1)
    fmt.Println("segs:", segs)

    segs2 := re.FindAllStringSubmatch(s, -1)
    fmt.Println("segs2:", segs2)

As you can see, the string 's' has the input. "Name1,Name2" is the current names list and "NewName1, NewName2" is the new names list. I want to extract these two lists. The two lists are separated by a comma. Each of the keywords are beginning with a double quote and their reach ends, when their corresponding double quote ends.

What is the way to use regexp such that the program can print "Name1, Name2" and "NewName1,NewName2" ?

图片转代码服务由CSDN问答提供 功能建议

我有一个包含两个关键字的字符串:“ CURRENT NAME(S)”和“ NEW NAME(S)” 每个关键字后面都跟着一堆单词。 我想从这些关键字的每一个之外提取那些单词。 要详细说明代码:

  s:=`“当前名称
 Name1,Name2”,“ NEW NAME 
NewName1,NewName2”  
 re:= regexp.MustCompile(`“ CURRENT NAME(S)。*” ,,“ NEW NAME(S)。*”`)
 
 segs:= re.FindAllString(s,-1)
  fmt.Println(“ segs:”,segs)
 
 segs2:= re.FindAllStringSubmatch(s,-1)
 fmt.Println(“ segs2:”,segs2)
    
 
 

如您所见,字符串“ s”具有输入。 “ Name1,Name2”是当前名称列表,“ NewName1,NewName2”是新名称列表。 我想提取这两个列表。 这两个列表用逗号分隔。 每个关键字都以双引号开头,并且当其对应的双引号结束时,它们的到达也结束。

使用正则表达式的方式是什么,以便程序可以打印“ Name1,Name2 ”和“ NewName1,NewName2 ”?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

3条回答 默认 最新

  • doushai7225 2016-07-25 08:29
    已采纳

    The issue with your regex is that the input string contains newline symbols, and . in Go regex does not match a newline. Another issue is that the .* is a greedy pattern and will match as many symbols as it can up to the last second keyword. Also, you need to escape parentheses in the regex pattern to match the ( and ) literal symbols.

    The best way to solve the issue is to change .* into a negated character class pattern [^"]* and place it inside a pair of non-escaped ( and ) to form a capturing group (a construct to get submatches from the match).

    Here is a Go demo:

    package main
    
    import (
        "fmt"
        "regexp"
    )
    
    func main() {
        s := `"CURRENT NAME(S)
     Name1, Name2",,"NEW NAME(S)
    NewName1,NewName2"`
        re := regexp.MustCompile(`"CURRENT NAME\(S\)\s*([^"]*)",,"NEW NAME\(S\)\s*([^"]*)"`)
    
        segs2 := re.FindAllStringSubmatch(s,-1)
        fmt.Printf("segs2: [%s; %s]", segs2[0][1], segs2[0][2])
    }
    

    Now, the regex matches:

    • "CURRENT NAME\(S\) - a literal string "CURRENT NAME(S)`
    • \s* - zero or more whitespaces
    • ([^"]*) - Group 1 capturing 0+ chars other than "
    • ",,"NEW NAME\(S\) - a literal string ",,"NEW NAME(S)
    • \s* - zero or more whitespaces
    • ([^"]*) - Group 2 capturing 0+ chars other than "
    • " - a literal "
    点赞 打赏 评论
  • dqkv0603 2016-07-25 07:07

    If your input doesn't change then the simplest way would be to use submatches (groups). You can try something like this:

    // (?s) is a flag that enables '.' to match newlines
    var r = regexp.MustCompile(`(?s)CURRENT NAME\(S\)(.*)",,"NEW NAME\(S\)(.*)"`)
    fmt.Println(r.MatchString(s))
    m := r.FindSubmatch([]byte(s)) // FindSubmatch requires []byte
    
    for _, match := range m {
        s := string(match)
        fmt.Printf("Match - %d: %s
    ", i, strings.Trim(s, "
    ")) //remove the newline
    }   
    

    Output: (Note that the first match is the entire input string because it completely matches the regex (https://golang.org/pkg/regexp/#Regexp.FindSubmatch)

    Match - 0: CURRENT NAME(S)
    Name1, Name2",,"NEW NAME(S)
    NewName1,NewName2"
    Match - 1: Name1, Name2
    Match - 2: NewName1,NewName2
    

    Example: https://play.golang.org/p/0cgBOMumtp

    点赞 打赏 评论
  • doucan8246326 2016-07-25 11:23

    For a fixed format like in the example, you can also avoid regular expressions and perform explicit parsing as in this example - https://play.golang.org/p/QDIyYiWJHt:

    package main
    
    import (
        "fmt"
        "strings"
    )
    
    func main() {
        s := `"CURRENT NAME(S)
     Name1, Name2",,"NEW NAME(S)
    NewName1,NewName2"`
    
        names := []string{}
        parts := strings.Split(s, ",,")
        for _, part := range parts {
            part = strings.Trim(part, `"`)
            part = strings.TrimPrefix(part, "CURRENT NAME(S)")
            part = strings.TrimPrefix(part, "NEW NAME(S)")
            part = strings.TrimSpace(part)
            names = append(names, part)
        }
        fmt.Println("Names:")
        for _, name := range names {
            fmt.Println(name)
        }
    }
    

    Output:

    Names:
    Name1, Name2
    NewName1,NewName2
    

    It uses a few more lines of code but may make it easier to understand the processing logic at a first glance.

    点赞 打赏 评论

相关推荐 更多相似问题