2018-10-01 15:08
Usually, when I'm replacing newlines I jump to Regexp, like in this PHP

preg_replace('/\R/u', "
", $String);

Because I know that to be a very durable way to replace any kind of Unicode newline (be it , , , etc.)

I was trying to something like this in Go as well, but I get

error parsing regexp: invalid escape sequence: \R

On this line

msg = regexp.MustCompilePOSIX("\\R").ReplaceAllString(html.EscapeString(msg), "<br>

I tried using (?:(?> )|\v) from https://stackoverflow.com/a/4389171/728236, but it looks like Go's regex implementation doesn't support that either, panicking with invalid or unsupported Perl syntax: '(?>'

What's a good, safe way to replace newlines in Go, Regex or not?

I see this answer here Golang: Issues replacing newlines in a string from a text file saying to use ? , but I'm hesitant to believe that it would get all Unicode newlines, mainly because of this question that has answer listing many more newline codepoints than the 3 that ? covers,

  • dragon071111 2018-10-01 15:52

    You may "decode" the \R pattern as


    See the Java regex docs explaining the \R shorthand:

    Linebreak matcher
    \R  Any Unicode linebreak sequence, is equivalent to \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]

    In Go, you may use the following:

    func removeLBR(text string) string {
        re := regexp.MustCompile(`\x{000D}\x{000A}|[\x{000A}\x{000B}\x{000C}\x{000D}\x{0085}\x{2028}\x{2029}]`)
        return re.ReplaceAllString(text, ``)

    Here is a Go demo.

    Some of the Unicode codes can be replaced with regex escape sequences supported by Go regexp:

    re := regexp.MustCompile(`
  • doukun1450 2018-10-01 23:24

    While using regexp usually yields an elegant and compact solution, often it's not the fastest.

    For tasks where you have to replace certain substrings with others, the standard library provides a really efficient solution in the form of strings.Replacer:

    Replacer replaces a list of strings with replacements. It is safe for concurrent use by multiple goroutines.

    You may create a reusable replacer with strings.NewReplacer(), where you list the pairs containing the replaceable parts and their replacements. When you want to perform a replacing, you simply call Replacer.Replace().

    Here's how it would look like:

    const replacement = "<br>
    var replacer = strings.NewReplacer(
    ", replacement,
        "", replacement,
    ", replacement,
        "\v", replacement,
        "\f", replacement,
        "\u0085", replacement,
        "\u2028", replacement,
        "\u2029", replacement,
    func replaceReplacer(s string) string {
        return replacer.Replace(s)

    Here's how the regexp solution from Wiktor's answer looks like:

    var re = regexp.MustCompile(`
    func replaceRegexp(s string) string {
        return re.ReplaceAllString(s, "<br>

    The implementation is actually quite fast. Here's a simple benchmark comparing it to the above pre-compiled regexp solution:

    const input = "1st
    func BenchmarkReplacer(b *testing.B) {
        for i := 0; i < b.N; i++ {
    func BenchmarkRegexp(b *testing.B) {
        for i := 0; i < b.N; i++ {

    And the benchmark results:

    BenchmarkReplacer-4      3000000               495 ns/op
    BenchmarkRegexp-4         500000              2787 ns/op

    For our test input, strings.Replacer was more than 5 times faster.

    There's also another advantage. In the example above we obtain the result as a new string value (in both solutions). This requires a new string allocation. If we need to write the result to an io.Writer (e.g. we're creating an HTTP response or writing the result to a file), we can avoid having to create the new string in case of strings.Replacer as it has a handy Replacer.WriteString() method which takes an io.Writer and writes the result into it without allocating and returning it as a string. This further significantly increases the performance gain compared to the regexp solution.

