dream0776
2019-07-31 09:47
浏览 145
已采纳

如何匹配包含Unicode字符的完整字符串?

I want to validate a string for e.g. name. A string without spaces. For normal Ascii a following regex would suffice "^\w+$" where ^ and $ takes the whole string into consideration. I tried to achieve the same result for unicode characters for supporting multiple languages using the \pL character class. But for some reason $ doesn't help match end of string. What am I doing wrong?

Code sample is here: https://play.golang.org/p/SPDEbWmqx0N

I copy pasted random characters from: http://www.columbia.edu/~fdc/utf8/

go version go1.12.5 darwin/amd64

package main

import (
    "fmt"
    "regexp"
)

func main() {

    // Unicode character class

    fmt.Println(regexp.MatchString(`^\pL+$`, "testuser"))  // expected true
    fmt.Println(regexp.MatchString(`^\pL+$`, "user with space")) // expected false 


    // Hindi script
    fmt.Println(regexp.MatchString(`^\pL+$`, "सकता")) // expected true doesn't match end of line

    // Hindi script
    fmt.Println(regexp.MatchString(`^\pL+`, "सकता")) // expected true

    // Chinese
    fmt.Println(regexp.MatchString(`^\pL+$`, "我能")) // expected true

    //French
    fmt.Println(regexp.MatchString(`^\pL+$`, "ægithaleshâtifs")) // expected true 

}
actual result:
true  <nil>
false <nil>
false <nil>
true <nil>
true <nil>
true <nil>

expected result:
true <nil>
false <nil>
true <nil>
true <nil>
true <nil>
true <nil>
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • doushi3715 2019-07-31 09:51
    已采纳

    You may use

    ^[\p{L}\p{M}]+$
    

    See Go demo.

    Details

    • ^ - start of string
    • [ - start of a character class that matches
      • \p{L} - any BMP letter
      • \p{M} - any diacritic
    • ]+ - end of the character class, repeat 1+ times
    • $ - end of string.

    If you plan to also match digits and _ as \w does, add them to the character class, ^[\p{L}\p{M}0-9_]+$ or ^[\p{L}\p{M}\p{N}_]+$.

    点赞 评论

相关推荐 更多相似问题