dqpwdai095465 2017-08-22 09:26 采纳率: 100%
浏览 71
已采纳

preg_match匹配组的名字/姓氏

I'm using this PHP regexp to check true/false whether a field contains a name, consisting of at least a first/last name, and then optional other middle names or initials.

$success = preg_match("/([\x{00c0}-\x{01ff}a-zA-Z'-]){2,}(\s([\x{00c0}-\x{01ff}a-zA-Z'-]{1,})*)?\s([\x{00c0}-\x{01ff}a-zA-Z'-]{2,})/ui",$user['name'],$matches);

$output[($success ? 'hits' : 'misses')][] = ['id' => $user['id'],'email' => $user['email'],'name' => $user['name'],'matches' => $matches];

Seems to work fine in terms of hits/misses, i.e. true/false whether it matches or not.

But then I'm trying to use the same thing to extract the first and last names using groups, which I'm struggling to get right..

Get lots of results like:

  "name": "Jonny Nott",
  "matches": [
    "Jonny Nott",
    "y",
    "",
    "",
    "Nott"
  ]

  "name": "Name Here",
  "matches": [
    "Name Here",
    "e",
    "",
    "",
    "Here"
  ]

  "matches": [
    "Jonathan M Notty",
    "n",
    " M",
    "M",
    "Notty"
  ]

..but what I really want is for one of the 'matches' to always contain just the first name, and one to contain always just the last name.

Any pointers as to what's wrong?

  • 写回答

3条回答 默认 最新

  • duanchun1881 2017-08-22 09:59
    关注

    Whenever you define a capturing group in a regular expression, the part of string it matches is added as a separate item in the resulting array. There are two strategies to get rid of them:

    • Optimize the pattern and get rid of the redundant groups (e.g. groups around single atoms - (a)+ => a+)
    • Turn capturing groups into non-capturing ((\s+\w+)+ => (?:\s+\w+)+)

    Also, in your case, you may enhance the patter if you replace the letter matching part with the \p{L} Unicode property class that matches any letters.

    Use

    /[\p{L}'-]{2,}(?:\s[\p{L}'-]+)?\s[\p{L}'-]{2,}/u
    

    See the regex demo

    Here, only one grouping is left, (?:...), and it is optional, the ? after it makes it match 1 or 0 times.

    Details

    • [\p{L}'-]{2,} - 2 or more letters, ' or -
    • (?:\s[\p{L}'-]+)? - 1 or 0 occurrences of a whitespace and then 1 or more letters, ' or -
    • \s - a whitespace
    • [\p{L}'-]{2,} - 2 or more letters, ' or -
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 表达式必须是可修改的左值
  • ¥15 如何绘制动力学系统的相图
  • ¥15 对接wps接口实现获取元数据
  • ¥20 给自己本科IT专业毕业的妹m找个实习工作
  • ¥15 用友U8:向一个无法连接的网络尝试了一个套接字操作,如何解决?
  • ¥30 我的代码按理说完成了模型的搭建、训练、验证测试等工作(标签-网络|关键词-变化检测)
  • ¥50 mac mini外接显示器 画质字体模糊
  • ¥15 TLS1.2协议通信解密
  • ¥40 图书信息管理系统程序编写
  • ¥20 Qcustomplot缩小曲线形状问题