如何匹配包含Unicode字符的完整字符串?

I want to validate a string for e.g. name. A string without spaces. For normal Ascii a following regex would suffice "^\w+$" where ^ and $ takes the whole string into consideration. I tried to achieve the same result for unicode characters for supporting multiple languages using the \pL character class. But for some reason $ doesn't help match end of string. What am I doing wrong?

Code sample is here: https://play.golang.org/p/SPDEbWmqx0N

I copy pasted random characters from: http://www.columbia.edu/~fdc/utf8/

go version go1.12.5 darwin/amd64

package main

import (
    "fmt"
    "regexp"
)

func main() {

    // Unicode character class

    fmt.Println(regexp.MatchString(`^\pL+$`, "testuser"))  // expected true
    fmt.Println(regexp.MatchString(`^\pL+$`, "user with space")) // expected false 


    // Hindi script
    fmt.Println(regexp.MatchString(`^\pL+$`, "सकता")) // expected true doesn't match end of line

    // Hindi script
    fmt.Println(regexp.MatchString(`^\pL+`, "सकता")) // expected true

    // Chinese
    fmt.Println(regexp.MatchString(`^\pL+$`, "我能")) // expected true

    //French
    fmt.Println(regexp.MatchString(`^\pL+$`, "ægithaleshâtifs")) // expected true 

}
actual result:
true  <nil>
false <nil>
false <nil>
true <nil>
true <nil>
true <nil>

expected result:
true <nil>
false <nil>
true <nil>
true <nil>
true <nil>
true <nil>
dqvy87517
dqvy87517 我精简了这个问题,以说明我在行尾遇到的问题。基本上需要使用\pM,如下面的答案所述。我现在了解这个问题。
大约一年之前 回复
duandai6373
duandai6373 如果您真的想检查“没有空格的字符串”,可以使用strings.ContainsAny或strings.IndexFunc进行。例如。play.golang.org/p/oTCqcPrJkcb
大约一年之前 回复

1个回答



您可以使用</ p>

  ^ [\ p {L} \ p {M  }] + $ 
</ code> </ pre>

请参见开始演示 。</ p>

详细信息</ strong> </ p>


  • ^ </ code>-开始 字符串</ li>
  • [</ code>”-匹配


    • \ p {L} </ code的字符类的开始 >-任何BMP字母</ li>
    • \ p {M} </ code>-任何变音符号</ li>
      </ ul> </ li>
    • ] + </ code>-字符类的结尾,重复1次以上</ li>
    • $ </ code>-字符串的结尾。</ li>
      </ ul>

      如果您还打算像 \ w </ code>一样匹配数字和 _ </ code>,请将它们添加到字符类 ^ [\ p { L} \ p {M} 0-9 _] + $ </ code>或 ^ [\ p {L} \ p {M} \ p {N} _] + $ </ code>。</ p >
      </ div>

展开原文

原文

You may use

^[\p{L}\p{M}]+$

See Go demo.

Details

  • ^ - start of string
  • [ - start of a character class that matches
    • \p{L} - any BMP letter
    • \p{M} - any diacritic
  • ]+ - end of the character class, repeat 1+ times
  • $ - end of string.

If you plan to also match digits and _ as \w does, add them to the character class, ^[\p{L}\p{M}0-9_]+$ or ^[\p{L}\p{M}\p{N}_]+$.

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问