使用正则表达式通配符获取不含周围文本的标签

I'm trying to get the value "done" in the following which is in a byte slice returned at the end of a chunked http stream.

X-sync-status: done

This is the go regex I've done so far

syncStatusRegex = regexp.MustCompile("(?i)X-sync-status:(.*)
")

I just want it to return this bit

(.*)

This is the code to get the status

syncStatus := strings.TrimSpace(string(syncStatusRegex.Find(body)))
fmt.Println(syncStatus)

How do I get it to just return "done" and not the header?

Thanks

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

dri98076 2019-09-14 11:42

关注

What you want to achieve is to access the capturing groups. I prefer named capturing groups and there is an extremely simple helper function to deal with that:

package main

import (
    "fmt"
    "regexp"
)

// Our example input
const input = "X-sync-status: done
"

// We anchor the regex to the beginning of a line with "^".
// Then we have a fixed string until our capturing group begins.
// Within our capturing group, we want to have all consecutive non-whitespace,
// non-control characters following.
const regexString = `(?i)^X-sync-status: (?P<status>\w*)`

// We ensure our regexp is valid and can be used.
var syncStatusRegexp *regexp.Regexp = regexp.MustCompile(regexString)


// The helper function...
func namedResults(re *regexp.Regexp, in string) map[string]string {

    // ... does the matching
    match := re.FindStringSubmatch(in)

    result := make(map[string]string)

    // and puts the value for each named capturing group
    // into the result map
    for i, name := range re.SubexpNames() {
        if i != 0 && name != "" {
            result[name] = match[i]
        }
    }
    return result
}

func main() {
    fmt.Println(namedResults(syncStatusRegexp, input)["status"])
}

<kbd>Run on playground</kbd>

Note Your current regexp is somewhat faulty, since you would capture whitespace as well. With your current regexp, the result would be " done" instead of "done".

Edit: Of course, you can do this much cheaper without regexp:

fmt.Print(strings.Trim(strings.Split(input, ":")[1], " 
"))

<kbd>Run on playground</kbd>

Edit2 I was curious how much cheaper the split method was, and hence I came up with the very crude:

package main

import (
    "fmt"
    "log"
    "regexp"
    "strings"
)

// Our example input
const input = "X-sync-status: done
"

// We anchor the regex to the beginning of a line with "^".
// Then we have a fixed string until our capturing group begins.
// Within our capturing group, we want to have all consecutive non-whitespace,
// non-control characters following.
const regexString = `(?i)^X-sync-status: (?P<status>\w*)`

// We ensure our regexp is valid and can be used.
var syncStatusRegexp *regexp.Regexp = regexp.MustCompile(regexString)

func statusBySplit(in string) string {
    return strings.Trim(strings.Split(input, ":")[1], " 
")
}

func statusByRegexp(re *regexp.Regexp, in string) string {
    return re.FindStringSubmatch(in)[1]
}

[...]

and a little benchmark:

package main

import "testing"

func BenchmarkRegexp(b *testing.B) {
    for i := 0; i < b.N; i++ {
        statusByRegexp(syncStatusRegexp, input)
    }
}

func BenchmarkSplit(b *testing.B) {
    for i := 0; i < b.N; i++ {
        statusBySplit(input)
    }
}

Then, I let those run 5 times each on one, two and 4 CPUs available. The result imho is pretty convincing:

go test -run=^$ -test.bench=.  -test.benchmem -test.cpu 1,2,4 -test.count=5
goos: darwin
goarch: amd64
pkg: github.com/mwmahlberg/so-regex
BenchmarkRegexp          5000000               383 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp          5000000               382 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp          5000000               384 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-2        5000000               384 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-2        5000000               384 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-2        5000000               382 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-4        5000000               382 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-4        5000000               382 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-4        5000000               380 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-4        5000000               380 ns/op              32 B/op          1 allocs/op
BenchmarkRegexp-4        5000000               377 ns/op              32 B/op          1 allocs/op
BenchmarkSplit          10000000               161 ns/op              80 B/op          3 allocs/op
BenchmarkSplit          10000000               161 ns/op              80 B/op          3 allocs/op
BenchmarkSplit          10000000               164 ns/op              80 B/op          3 allocs/op
BenchmarkSplit          10000000               165 ns/op              80 B/op          3 allocs/op
BenchmarkSplit          10000000               162 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-2        10000000               167 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-2        10000000               161 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-2        10000000               159 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-4        10000000               159 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-4        10000000               161 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-4        10000000               159 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-4        10000000               160 ns/op              80 B/op          3 allocs/op
BenchmarkSplit-4        10000000               160 ns/op              80 B/op          3 allocs/op
PASS
ok      github.com/mwmahlberg/so-regex  61.340s

It clearly shows that in the case of splitting tags, actually using a split is more than twice as fast as a precompiled regexp. For your use case, I would clearly go for using split, then.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

报告相同问题？

关注问题

ultraedit使用正则表达式 通配符搜索替换，如何保留通配符所代表的字符 lua 全文检索正则表达式
2021-02-05 16:55

回答 1 已采纳参考GPT和自己的思路：要在UltraEdit中使用正则表达式和通配符搜索替换，并保留通配符所代表的字符，你可以按照以下步骤操作：打开UltraEdit软件，并在其编辑器中打开目标文件。按下Ct
正则表达式使用通配符验证域 php
2015-12-04 23:05

回答 1 已采纳 I found a solution using regex that doesn't fully respect domain rules but it works good enough, s
如何使用Behat查找带有正则表达式作为通配符的字符串？ php
2018-01-30 10:03

回答 1 已采纳 Some of the options would be: A Create a selector based on the 2 partial text and exclude what yo
oracle正则表达式包含但不含_Oracle 正则表达式（详细）
2021-03-04 07:31

力气气的博客 Oracle 正则表达式正则表达式就是由...本文详细地列出了能在正则表达式中使用，以匹配文本的各种字符。当你需要解释一个现有的正则表达式时，可以作为一个快捷的参考。一. 匹配字符字符类匹配的字符举例\d从０-９...
PHP正则表达式+通配符中的多个值 php
2015-08-08 11:06

回答 1 已采纳 You're basically searching for a case-insensitive Cake. Then just use the i flag. preg_match("~
PHP：正则表达式使用Lookbehind Assertions中的通配符替换单词 php
2013-07-03 05:18

回答 1 已采纳 As I mentioned you need to use an html parser. But if you want it /\btest\b(?=[^>]*(<|$))/
正则表达式搜索2个字符串之间的可选通配符 php
2012-12-06 15:11

回答 2 已采纳 Some tweaking may be required, but this would match up to two words in between "not" and "happy":
oracle正则表达式包含但不含_ORACLE正则表达式函数
2020-12-19 04:10

weixin_39927993的博客一、REGEXP_LIKE--查找与正则表达式匹配的字符串。语法：REGEXP_LIKE(source_char, pattern[, match_param ])二、REGEXP_COUNT--计算模式在源串中出现的次数语法：REGEXP_COUNT (source_char, pattern [, position ...
Golang正则表达式无法匹配字节10
2014-10-16 22:43

回答 1 已采纳 You need to tell it to match against new lines .. by specifying the s flag: r := regexp.MustCompi
word vba的查找通配符该如何写呢？开发语言正则表达式
2019-02-28 12:24

回答 2 已采纳代码稍作了修改，除了第10段匹配不到，其他没问题。不知是不是其他什么问题 Sub 通配符查找改变颜色() For i = 1 To ActiveDocument.Paragraphs.Coun
使用前缀时路由URI通配符 laravel php
2019-01-31 13:19

回答 1 已采纳 You need to configure {any} as an optional parameter. In your particular case, the route will only
MySQL使用正则表达式
2017-12-26 17:18

haijiege的博客以前我要查找数据都是使用like后来发现mysql中也有正则表达式了并且感觉性能要好于like，下面我来给大家分享一下mysql REGEXP正则表达式使用详解，希望此方法对大家有帮助。一、正则与LIKE的区别 Mysql...
正则表达式与文件格式化处理
2023-02-25 22:48

昵称难产中的博客 正则表达式就是处理字符串的方法，它以行为单位来进行字符串的处理操作，正则表达式通过一些特殊符号...正则表达式与通配符的区别：通配符代表的是bash操作接口的一个功能，而正则表达式则是一种字符串处理的表达方式。
正则表达式使用的注意事项
2022-01-08 15:22

周山至水数翠峰的博客 正则表达式使用的注意事项下面是正则表达式常用的通配符，及注意事项 . - 除换行符以外的所有字符。*包含utf8汉字 ^ - 字符串开头。 $ - 字符串结尾。 \d,\w,\s - 匹配数字、字符、空格。字符不含汉字。 \D,\W,\S -...
shell编程之 正则表达式
2015-10-24 22:55

在Shell编程中，正则表达式是一种非常强大的文本处理工具，可以用于模式匹配、文本搜索与替换等操作。它能够帮助我们更加灵活地处理各种字符串数据，是自动化脚本编写中的重要组成部分。 #### 正则表达式的分类与...
没有解决我的问题, 去提问

悬赏问题

¥15 如何让企业微信机器人实现消息汇总整合
¥50 关于#ui#的问题：做yolov8的ui界面出现的问题
¥15 如何用Python爬取各高校教师公开的教育和工作经历
¥15 TLE9879QXA40 电机驱动
¥20 对于工程问题的非线性数学模型进行线性化
¥15 Mirare PLUS 进行密钥认证？（详解）
¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
¥20 想用ollama做一个自己的AI数据库
¥15 关于qualoth编辑及缝合服装领子的问题解决方案探寻
¥15 请问怎么才能复现这样的图呀

码龄粉丝数原力等级 --

使用正则表达式通配符获取不含周围文本的标签

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

使用正则表达式通配符获取不含周围文本的标签

1条回答 默认 最新

悬赏问题

1条回答默认最新