duanma8207 2016-02-04 04:50
浏览 105
已采纳

带有拉丁字符的Golang正则表达式边界

I have a small tricky issue about golang regex. seems \b boundering option doesn't work when I put latein chars like this.

I expected that é should be treated as a regular char.. but it's treated as one of boundering wards.

package main

import (
    "fmt"
    "regexp"
)

func main() {   
    r, _ := regexp.Compile(`\b(vis)\b`)
    fmt.Println(r.MatchString("re vis e"))
    fmt.Println(r.MatchString("revise"))
    fmt.Println(r.MatchString("révisé"))
}

result was:

true 
false 
true

Please give me any suggestion how to deal with r.MatchString("révisé") as false ?

Thank you

  • 写回答

1条回答 默认 最新

  • duandou8120 2016-02-04 05:09
    关注

    The issue is that \b is only for boundaries around ASCII characters, as stated in the docs:

    at ASCII word boundary (\w on one side and \W, \A, or \z on the other)

    And é is not ASCII. But, you can make your own \b replacement by combining other regex shortcuts. Here is a simple solution that solves the case given in the question, though you may want to add more thorough matching:

    package main
    
    import (
        "fmt"
        "regexp"
    )
    
    func main() {   
        r, _ := regexp.Compile(`(?:\A|\s)(vis)(?:\s|\z)`)
        fmt.Println(r.MatchString("vis")) // added this case
        fmt.Println(r.MatchString("re vis e"))
        fmt.Println(r.MatchString("revise"))
        fmt.Println(r.MatchString("révisé"))
    }
    

    Running this gives:

    true
    true
    false
    false
    

    What this solution does is essentially replace \b with (?:\A|\z|\s), which means "a non-capturing group with one of the following: start of string, end of string or whitespace". You may want to add other possibilities here, like punctuation.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化