dongxuan2015 2013-08-17 20:16
浏览 67
已采纳

使用正则表达式提取大写单词和camecased单词[关闭]

I have the following string:

Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL

I need to extract Beyonce Knowles, Jay-Z, KANYE WEST, West Palm Beach, FL and San Antonio Texas (separated)

I'm still new to regex, but so far I've got '/^[A-Z]+/

How do I fix my regex to account for the extracted words I am trying to obtain?

Thanks

  • 写回答

1条回答 默认 最新

  • duanluanlang8501 2013-08-17 20:26
    关注

    You could try this:

    /\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u
    

    This will match one or more uppercase letters followed by zero or more lowercase letters, possibly repeated multiple times, separated by one or more white space or punctuation characters. It takes advantage of Unicode character classes so it can handle text in other languages.

    Or this to match just two such patterns in a row:

    /\p{Lu}+\p{L}*[\s\p{P}]+\p{Lu}+\p{L}*/u
    

    For example:

    $input = 'Beyonce Knowles is married to Jay-Z and KANYE WEST is awesome and San Antonio Texas is great but not as good as West Palm Beach, FL';
    $pattern = '/\p{Lu}+\p{L}*(?:[\s\p{P}]+\p{Lu}+\p{L}*)*/u';
    preg_match_all($pattern, $input, $output_array);
    

    Produces the array:

    Array
    (
        [0] => Array 
            (
                [0] => Beyonce Knowles
                [1] => Jay-Z
                [2] => KANYE WEST
                [3] => San Antonio Texas
                [4] => West Palm Beach, FL
            )
    )
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 这Mac系统提示虚拟内存不足,怎么解决
  • ¥15 Rs232电路无法收发数据,求帮助
  • ¥15 百度cookie扫码登录器
  • ¥15 微机原理汇编语言debug调试实验
  • ¥23 matlab可以把相图转换为庞加莱映射吗
  • ¥20 有偿,学生成绩信息管理系统
  • ¥15 Arduino电机和openmv连接异常
  • ¥15 Arcgis河网分级报错
  • ¥200 java+appium2.1+idea
  • ¥20 请帮我做一个EXE的去重TXT文本