dtt2012
2018-10-01 15:08
浏览 609
已采纳

Golang替换所有换行符

Usually, when I'm replacing newlines I jump to Regexp, like in this PHP

preg_replace('/\R/u', "
", $String);

Because I know that to be a very durable way to replace any kind of Unicode newline (be it , , , etc.)

I was trying to something like this in Go as well, but I get

error parsing regexp: invalid escape sequence: \R

On this line

msg = regexp.MustCompilePOSIX("\\R").ReplaceAllString(html.EscapeString(msg), "<br>
")

I tried using (?:(?> )|\v) from https://stackoverflow.com/a/4389171/728236, but it looks like Go's regex implementation doesn't support that either, panicking with invalid or unsupported Perl syntax: '(?>'

What's a good, safe way to replace newlines in Go, Regex or not?


I see this answer here Golang: Issues replacing newlines in a string from a text file saying to use ? , but I'm hesitant to believe that it would get all Unicode newlines, mainly because of this question that has answer listing many more newline codepoints than the 3 that ? covers,

  • 写回答
  • 好问题 提建议
  • 关注问题
  • 收藏
  • 邀请回答

2条回答 默认 最新

  • dragon071111 2018-10-01 15:52
    已采纳

    You may "decode" the \R pattern as

    U+000DU+000A|[U+000AU+000BU+000CU+000DU+0085U+2028U+2029]
    

    See the Java regex docs explaining the \R shorthand:

    Linebreak matcher
    \R  Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

    In Go, you may use the following:

    func removeLBR(text string) string {
        re := regexp.MustCompile(`\x{000D}\x{000A}|[\x{000A}\x{000B}\x{000C}\x{000D}\x{0085}\x{2028}\x{2029}]`)
        return re.ReplaceAllString(text, ``)
    }
    

    Here is a Go demo.

    Some of the Unicode codes can be replaced with regex escape sequences supported by Go regexp:

    re := regexp.MustCompile(`
    |[
    \v\f\x{0085}\x{2028}\x{2029}]`)
    
    已采纳该答案
    评论
    解决 无用
    打赏 举报
  • doukun1450 2018-10-01 23:24

    While using regexp usually yields an elegant and compact solution, often it's not the fastest.

    For tasks where you have to replace certain substrings with others, the standard library provides a really efficient solution in the form of strings.Replacer:

    Replacer replaces a list of strings with replacements. It is safe for concurrent use by multiple goroutines.

    You may create a reusable replacer with strings.NewReplacer(), where you list the pairs containing the replaceable parts and their replacements. When you want to perform a replacing, you simply call Replacer.Replace().

    Here's how it would look like:

    const replacement = "<br>
    "
    
    var replacer = strings.NewReplacer(
        "
    ", replacement,
        "", replacement,
        "
    ", replacement,
        "\v", replacement,
        "\f", replacement,
        "\u0085", replacement,
        "\u2028", replacement,
        "\u2029", replacement,
    )
    
    func replaceReplacer(s string) string {
        return replacer.Replace(s)
    }
    

    Here's how the regexp solution from Wiktor's answer looks like:

    var re = regexp.MustCompile(`
    |[
    \v\f\x{0085}\x{2028}\x{2029}]`)
    
    func replaceRegexp(s string) string {
        return re.ReplaceAllString(s, "<br>
    ")
    }
    

    The implementation is actually quite fast. Here's a simple benchmark comparing it to the above pre-compiled regexp solution:

    const input = "1st
    second
    third4th\u0085fifth\u2028sixth"
    
    func BenchmarkReplacer(b *testing.B) {
        for i := 0; i < b.N; i++ {
            replaceReplacer(input)
        }
    }
    
    func BenchmarkRegexp(b *testing.B) {
        for i := 0; i < b.N; i++ {
            replaceRegexp(input)
        }
    }
    

    And the benchmark results:

    BenchmarkReplacer-4      3000000               495 ns/op
    BenchmarkRegexp-4         500000              2787 ns/op
    

    For our test input, strings.Replacer was more than 5 times faster.

    There's also another advantage. In the example above we obtain the result as a new string value (in both solutions). This requires a new string allocation. If we need to write the result to an io.Writer (e.g. we're creating an HTTP response or writing the result to a file), we can avoid having to create the new string in case of strings.Replacer as it has a handy Replacer.WriteString() method which takes an io.Writer and writes the result into it without allocating and returning it as a string. This further significantly increases the performance gain compared to the regexp solution.

    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题