douxi2011 2015-10-09 14:52
浏览 754
已采纳

用空格和冒号分割字符串,但如果在引号内则不分割

having a string like this:

$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"

the desired result is:

[0] => Array (
    [0] => dateto:'2015-10-07 15:05'
    [1] => xxxx
    [2] => datefrom:'2015-10-09 15:05'
    [3] => yyyy
    [4] => asdf
)

what I get with:

preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);

is:

[0] => Array (
    [0] => dateto:'2015-10-07
    [1] => 15:05'
    [2] => xxxx
    [3] => datefrom:'2015-10-09
    [4] => 15:05'
    [5] => yyyy
    [6] => asdf
)

Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!

  • 写回答

3条回答

  • duanbing6955 2015-10-09 16:53
    关注

    Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):

    $pattern = <<<'EOD'
    ~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
    EOD;
    
    if (preg_match_all($pattern, $str, $m))
        $result = $m[0];
    

    pattern details:

    ~                    # pattern delimiter
    
    (?=\S)               # the lookahead assertion only succeeds if there is a non-
                         # white-space character at the current position.
                         # (This lookahead is useful for two reasons:
                         #    - it allows the regex engine to quickly find the start of
                         #      the next item without to have to test each branch of the
                         #      following alternation at each position in the strings
                         #      until one succeeds.
                         #    - it ensures that there's at least one non-white-space.
                         #      Without it, the pattern may match an empty string.
                         # )
    
    [^'"\s]*          #"'# all that is not a quote or a white-space
    
    (?:                  # eventual quoted parts
        '[^']*' [^'"\s]*  #"# single quotes
      |
        "[^"]*" [^'"\s]*    # double quotes
    )*
    ~
    

    demo

    Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:

    ~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
    

    but it's a little less efficient.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler