duanhao1004 2019-09-14 10:42
浏览 168
已采纳

使用正则表达式通配符获取不含周围文本的标签

I'm trying to get the value "done" in the following which is in a byte slice returned at the end of a chunked http stream.

X-sync-status: done

This is the go regex I've done so far

syncStatusRegex = regexp.MustCompile("(?i)X-sync-status:(.*)
")

I just want it to return this bit

(.*)

This is the code to get the status

syncStatus := strings.TrimSpace(string(syncStatusRegex.Find(body)))
fmt.Println(syncStatus)

How do I get it to just return "done" and not the header?

Thanks

  • 写回答

1条回答 默认 最新

  • dri98076 2019-09-14 11:42
    关注

    What you want to achieve is to access the capturing groups. I prefer named capturing groups and there is an extremely simple helper function to deal with that:

    package main
    
    import (
        "fmt"
        "regexp"
    )
    
    // Our example input
    const input = "X-sync-status: done
    "
    
    // We anchor the regex to the beginning of a line with "^".
    // Then we have a fixed string until our capturing group begins.
    // Within our capturing group, we want to have all consecutive non-whitespace,
    // non-control characters following.
    const regexString = `(?i)^X-sync-status: (?P<status>\w*)`
    
    // We ensure our regexp is valid and can be used.
    var syncStatusRegexp *regexp.Regexp = regexp.MustCompile(regexString)
    
    
    // The helper function...
    func namedResults(re *regexp.Regexp, in string) map[string]string {
    
        // ... does the matching
        match := re.FindStringSubmatch(in)
    
        result := make(map[string]string)
    
        // and puts the value for each named capturing group
        // into the result map
        for i, name := range re.SubexpNames() {
            if i != 0 && name != "" {
                result[name] = match[i]
            }
        }
        return result
    }
    
    func main() {
        fmt.Println(namedResults(syncStatusRegexp, input)["status"])
    }
    

    <kbd>Run on playground</kbd>

    Note Your current regexp is somewhat faulty, since you would capture whitespace as well. With your current regexp, the result would be " done" instead of "done".

    Edit: Of course, you can do this much cheaper without regexp:

    fmt.Print(strings.Trim(strings.Split(input, ":")[1], " 
    "))
    

    <kbd>Run on playground</kbd>

    Edit2 I was curious how much cheaper the split method was, and hence I came up with the very crude:

    package main
    
    import (
        "fmt"
        "log"
        "regexp"
        "strings"
    )
    
    // Our example input
    const input = "X-sync-status: done
    "
    
    // We anchor the regex to the beginning of a line with "^".
    // Then we have a fixed string until our capturing group begins.
    // Within our capturing group, we want to have all consecutive non-whitespace,
    // non-control characters following.
    const regexString = `(?i)^X-sync-status: (?P<status>\w*)`
    
    // We ensure our regexp is valid and can be used.
    var syncStatusRegexp *regexp.Regexp = regexp.MustCompile(regexString)
    
    func statusBySplit(in string) string {
        return strings.Trim(strings.Split(input, ":")[1], " 
    ")
    }
    
    func statusByRegexp(re *regexp.Regexp, in string) string {
        return re.FindStringSubmatch(in)[1]
    }
    
    [...]
    

    and a little benchmark:

    package main
    
    import "testing"
    
    func BenchmarkRegexp(b *testing.B) {
        for i := 0; i < b.N; i++ {
            statusByRegexp(syncStatusRegexp, input)
        }
    }
    
    func BenchmarkSplit(b *testing.B) {
        for i := 0; i < b.N; i++ {
            statusBySplit(input)
        }
    }
    

    Then, I let those run 5 times each on one, two and 4 CPUs available. The result imho is pretty convincing:

    go test -run=^$ -test.bench=.  -test.benchmem -test.cpu 1,2,4 -test.count=5
    goos: darwin
    goarch: amd64
    pkg: github.com/mwmahlberg/so-regex
    BenchmarkRegexp          5000000               383 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp          5000000               384 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               384 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               384 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               382 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               380 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               380 ns/op              32 B/op          1 allocs/op
    BenchmarkRegexp-4        5000000               377 ns/op              32 B/op          1 allocs/op
    BenchmarkSplit          10000000               161 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit          10000000               161 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit          10000000               164 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit          10000000               165 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit          10000000               162 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               167 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               161 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               161 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               159 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               160 ns/op              80 B/op          3 allocs/op
    BenchmarkSplit-4        10000000               160 ns/op              80 B/op          3 allocs/op
    PASS
    ok      github.com/mwmahlberg/so-regex  61.340s
    

    It clearly shows that in the case of splitting tags, actually using a split is more than twice as fast as a precompiled regexp. For your use case, I would clearly go for using split, then.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 vc6.0中想运行代码的时候总是提示无法打开文件是怎么回事
  • ¥25 关于##爬虫##的问题,如何解决?:
  • ¥15 ZABBIX6.0L连接数据库报错,如何解决?(操作系统-centos)
  • ¥15 找一位技术过硬的游戏pj程序员
  • ¥15 matlab生成电测深三层曲线模型代码
  • ¥50 随机森林与房贷信用风险模型
  • ¥50 buildozer打包kivy app失败
  • ¥30 在vs2022里运行python代码
  • ¥15 不同尺寸货物如何寻找合适的包装箱型谱
  • ¥15 求解 yolo算法问题