duanhao1004
2019-09-14 10:42
浏览 148
已采纳

使用正则表达式通配符获取不含周围文本的标签

I'm trying to get the value "done" in the following which is in a byte slice returned at the end of a chunked http stream.

X-sync-status: done

This is the go regex I've done so far

syncStatusRegex = regexp.MustCompile("(?i)X-sync-status:(.*)
")

I just want it to return this bit

(.*)

This is the code to get the status

syncStatus := strings.TrimSpace(string(syncStatusRegex.Find(body)))
fmt.Println(syncStatus)

How do I get it to just return "done" and not the header?

Thanks

图片转代码服务由CSDN问答提供 功能建议

我试图在下面获取值“ done”,该值位于最后返回的字节片中

  X-sync-status:完成
 
 
   
 
 

到目前为止,我已经执行了go regex

  syncStatusRegex = regexp.MustCompile(“(?i)X-sync-status:(。*)
 
”)  
   
 
 

我只希望它返回此位

 (。*)
  <  / pre> 
 
 

这是获取状态的代码

  syncStatus:= strings.TrimSpace(string(syncStatusRegex.Find(body)))\  nfmt.Println(syncStatus)
   
 
 

如何让它只返回“ done”而不是标题?

谢谢

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • dri98076 2019-09-14 11:42
    已采纳

    What you want to achieve is to access the capturing groups. I prefer named capturing groups and there is an extremely simple helper function to deal with that:

    package main
    
    import (
        "fmt"
        "regexp"
    )
    
    // Our example input
    const input = "X-sync-status: done
    "
    
    // We anchor the regex to the beginning of a line with "^".
    // Then we have a fixed string until our capturing group begins.
    // Within our capturing group, we want to have all consecutive non-whitespace,
    // non-control characters following.
    const regexString = `(?i)^X-sync-status: (?P<status>\w*)`
    
    // We ensure our regexp is valid and can be used.
    var syncStatusRegexp *regexp.Regexp = regexp.MustCompile(regexString)
    
    
    // The helper function...
    func namedResults(re *regexp.Regexp, in string) map[string]string {
    
        // ... does the matching
        match := re.FindStringSubmatch(in)
    
        result := make(map[string]string)
    
        // and puts the value for each named capturing group
        // into the result map
        for i, name := range re.SubexpNames() {
            if i != 0 && name != "" {
                result[name] = match[i]
            }
        }
        return result
    }
    
    func main() {
        fmt.Println(namedResults(syncStatusRegexp, input)["status"])
    }
    

    Run on playground

    Note Your current regexp is somewhat faulty, since you would capture whitespace as well. With your current regexp, the result would be " done" instead of "done".

    Edit: Of course, you can do this much cheaper without regexp:

    fmt.Print(strings.Trim(strings.Split(input, ":")[1], " 
    "))
    

    Run on playground

    Edit2 I was curious how much cheaper the split method was, and hence I came up with the very crude:

    package main
    
    import (
        "fmt"
        "log"
        "regexp"
        "strings"
    )
    
    // Our example input
    const input = "X-sync-status: done
    "
    
    // We anchor the regex to the beginning of a line with "^".
    // Then we have a fixed string until our capturing group begins.
    // Within our capturing group, we want to have all consecutive non-whitespace,
    // non-control characters following.
    const regexString = `(?i)^X-sync-status: (?P<status>\w*)`
    
    // We ensure our regexp is valid and can be used.
    var syncStatusRegexp *regexp.Regexp = regexp.MustCompile(regexString)
    
    func statusBySplit(in string) string {
        return strings.Trim(strings.Split(input, ":")[1], " 
    ")
    }
    
    func statusByRegexp(re *regexp.Regexp, in string) string {
        return re.FindStringSubmatch(in)[1]
    }
    
    [...]
    

    and a little benchmark:

    package main
    
    import "testing"
    
    func BenchmarkRegexp(b *testing.B) {
        for i := 0; i < b.N; i++ {
            statusByRegexp(syncStatusRegexp, input)
        }
    }
    
    func BenchmarkSplit(b *testing.B) {
        for i := 0; i < b.N; i++ {
            statusBySplit(input)
        }
    }
    

    Then, I let those run 5 times each on one, two and 4 CPUs available. The result imho is pretty convincing:

    go test -run=^$ -test.bench=.  -test.benchmem -test.cpu 1,2,4 -test.count=5
    goos: darwin
    goarch: amd64
    pkg: github.com/mwmahlberg/so-regex
    BenchmarkRegexp          5000000               383 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp          5000000               384 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               384 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               384 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               380 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               380 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               377 ns/op              32 B/op          1 allocs/op
    BenchmarkSplit          10000000               161 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit          10000000               161 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit          10000000               164 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit          10000000               165 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit          10000000               162 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               167 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               161 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               161 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               160 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               160 ns/op              80 B/op          3 allocs/op
    PASS
    ok      github.com/mwmahlberg/so-regex  61.340s
    

    It clearly shows that in the case of splitting tags, actually using a split is more than twice as fast as a precompiled regexp. For your use case, I would clearly go for using split, then.

    已采纳该答案
    打赏 评论

相关推荐 更多相似问题