douxi2011 2015-10-09 14:52
浏览 754
已采纳

用空格和冒号分割字符串,但如果在引号内则不分割

having a string like this:

$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"

the desired result is:

[0] => Array (
    [0] => dateto:'2015-10-07 15:05'
    [1] => xxxx
    [2] => datefrom:'2015-10-09 15:05'
    [3] => yyyy
    [4] => asdf
)

what I get with:

preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);

is:

[0] => Array (
    [0] => dateto:'2015-10-07
    [1] => 15:05'
    [2] => xxxx
    [3] => datefrom:'2015-10-09
    [4] => 15:05'
    [5] => yyyy
    [6] => asdf
)

Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!

  • 写回答

3条回答 默认 最新

  • duanbing6955 2015-10-09 16:53
    关注

    Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):

    $pattern = <<<'EOD'
    ~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
    EOD;
    
    if (preg_match_all($pattern, $str, $m))
        $result = $m[0];
    

    pattern details:

    ~                    # pattern delimiter
    
    (?=\S)               # the lookahead assertion only succeeds if there is a non-
                         # white-space character at the current position.
                         # (This lookahead is useful for two reasons:
                         #    - it allows the regex engine to quickly find the start of
                         #      the next item without to have to test each branch of the
                         #      following alternation at each position in the strings
                         #      until one succeeds.
                         #    - it ensures that there's at least one non-white-space.
                         #      Without it, the pattern may match an empty string.
                         # )
    
    [^'"\s]*          #"'# all that is not a quote or a white-space
    
    (?:                  # eventual quoted parts
        '[^']*' [^'"\s]*  #"# single quotes
      |
        "[^"]*" [^'"\s]*    # double quotes
    )*
    ~
    

    demo

    Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:

    ~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
    

    but it's a little less efficient.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)
  • ¥20 matlab yalmip kkt 双层优化问题
  • ¥15 如何在3D高斯飞溅的渲染的场景中获得一个可控的旋转物体
  • ¥88 实在没有想法,需要个思路
  • ¥15 MATLAB报错输入参数太多