doujizhong8352 2013-10-17 16:55
浏览 44
已采纳

使用preg_match_all查找每个单词

I'd like to extract every word seperately from any phrase. I also need to match special characters, such as umlauts.

Currently, I use this:

preg_match_all('/\b([a-zA-ZäöüåÄÖÜÅ]*)\b/', $string, $matches);

However, this gives me redundant and empty matches. For example, "zu spät" returns

Array ( [0] => Array ( [0] => zu [1] => [2] => spät [3] => ) 
        [1] => Array ( [0] => zu [1] => [2] => spät [3] => ) ) 

What is the correct expression to match "any letter"? What can I do about the double and empty matches?

  • 写回答

1条回答 默认 最新

  • doubeng3216 2013-10-17 16:58
    关注

    You can try this:

    preg_match_all('/\b\p{L}+\b/u', $string, $matches);
    

    Where \p{L} matches any letters.

    In your code sample you obtain the result "in double": the first is the whole pattern, the second is the capturing group. This is the reason why I have removed the capture group. To avoid empty results, I have replaced the * quantifier (zero or more times) by the + quantifier (one or more times).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 需要跳转番茄畅听app的adb命令
  • ¥50 寻找一位有逆向游戏盾sdk 应用程序经验的技术
  • ¥15 请问有用MZmine处理 “Waters SYNAPT G2-Si QTOF质谱仪在MSE模式下采集的非靶向数据” 的分析教程吗
  • ¥50 opencv4nodejs 如何安装
  • ¥15 adb push异常 adb: error: 1409-byte write failed: Invalid argument
  • ¥15 nginx反向代理获取ip,java获取真实ip
  • ¥15 eda:门禁系统设计
  • ¥50 如何使用js去调用vscode-js-debugger的方法去调试网页
  • ¥15 376.1电表主站通信协议下发指令全被否认问题
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证