dongnan1899 2010-04-20 13:05
浏览 253
已采纳

PHP正则表达式匹配带有全部大写字母的行与偶尔的连字符

I'm trying to to convert an existing PHP regular expression to apply to a slightly different style of document.

Here's the original style of the document:

**FOODS - TYPE A** 
___________________________________ 
**PRODUCT** 
1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese; 
2) La Fe String Cheese 
**CODE** 
Sell by date going back to February 1, 2009 

And the successfully-running PHP Regex match code that only returns "true" if the line is surrounded by asterisks, and stores each side of the "-" as $m[1] and $m[2], respectively.

 if ( preg_match('#^\*\*([^-]+)(?:-(.*))?\*\*$#', $line, $m) ) { 
    // only for **header - subheader** $m[2] is set. 
    if ( isset($m[2]) ) { 
      return array(TYPE_HEADER, array(trim($m[1]), trim($m[2]))); 
    } 
    else { 
      return array(TYPE_KEY, array($m[1])); 
    } 
  } 

So, for line 1: $m[1] = "FOODS" AND $m[2] = "TYPE A"; Line 2 would be skipped; Line 3: $m[1] = "PRODUCT", etc.

The question: How would I re-write the above regex match if the headers did not have the asterisks, but still was all-caps, and was at least 4 characters long? For example:

FOODS - TYPE A 
___________________________________ 
PRODUCT
1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese; 
2) La Fe String Cheese 
CODE
Sell by date going back to February 1, 2009 

Thank you.

  • 写回答

4条回答 默认 最新

  • dqydp44800 2010-04-20 13:14
    关注

    Along the lines of (don't forget the "u" flag for Unicode regexes):

    ^(?:\*\*)?(?=[^*]{4,})(\p{Lu}+)(?:\s*-\s*(\p{Lu}+))?(?:\*\*)?\s*$
    
    ^               # start of line
    (?:\*\*)?       # two stars, optional
    (?=[^*]{4,})    # followed by at least 4 non-star characters
    (\p{Lu}+)       # group 1, Unicode upper case letters
    (?:             # start no capture group
      \s*-\s*       #   space*, dash, space*
      (\p{Lu}+)     #   group 2, Inicode upper case letters
    )?              # end no capture group, make optional
    (?:\*\*)?       # two stars, optional
    \s*             # optional trailing spaces
    $               # end of line
    

    EDIT: Simplified, as per the comments:

    ^(?=[A-Z ]{4,})([A-Z ]+)(?:-([A-Z ]+))?\s*$
    
    ^               # start of line
    (?=[A-Z -]{4,}) # followed by at least 4 upper case characters, spaces or dashes
    ([A-Z ]+)       # group 1, upper case letters or space
    (?:             # start no capture group
      -             #   a dash
      ([A-Z ]+)     #   group 2, upper case letters or space
    )?              # end no capture group, make optional
    \s*             # optional trailing spaces
    $               # end of line
    

    Contents of groups 1 and 2 must be trimmed before use.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥200 uniapp长期运行卡死问题解决
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?
  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集