dqoqnmb163241 2016-06-07 08:16
浏览 148
已采纳

使用UTF-8(Umlaute)进行正则表达式和拼写检查

I'm having trouble with this piece of code. What it should do is take a string, split it by word, then check it against a dictionary. However when the string contains an "Umlaut" ÄäÖöÜü it splits it there.

I'm pretty sure the problem is [A-ZäöüÄÖÜ\'] it seems i'm including the special charackters wrong, but how?

$string = "Rechtschreibprüfung";      
preg_match_all("/[A-ZäöüÄÖÜ\']{1,16}/i", $string, $words);
for ($i = 0; $i < count($words[0]); ++$i) {
    if (!pspell_check($pspell_link, $words[0][$i])) {
        $array[] = $words[0][$i];            
    }
}

result:

$array[0] = Rechtschreibprü"
$array[1] = "fung"
  • 写回答

1条回答 默认 最新

  • dpfwhb7470 2016-06-07 08:46
    关注

    To match a chunk of Unicode letters, you can use

    '/\p{L}+/u'
    

    The \p{L} matches any Unicode letter, + matches one or more occurrenes of the preceding subpattern and the /u modifier treats the pattern and string as Unicode strings.

    To only match whole words, use word boundaries:

    '/\b\p{L}+\b/u'
    

    If you have diacritics, also add \p{M}:

    '/\b[\p{M}\p{L}]+\b/u'
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效
  • ¥15 悬赏!微信开发者工具报错,求帮改