douxi2011 2015-10-09 14:52
浏览 754
已采纳

用空格和冒号分割字符串,但如果在引号内则不分割

having a string like this:

$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"

the desired result is:

[0] => Array (
    [0] => dateto:'2015-10-07 15:05'
    [1] => xxxx
    [2] => datefrom:'2015-10-09 15:05'
    [3] => yyyy
    [4] => asdf
)

what I get with:

preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);

is:

[0] => Array (
    [0] => dateto:'2015-10-07
    [1] => 15:05'
    [2] => xxxx
    [3] => datefrom:'2015-10-09
    [4] => 15:05'
    [5] => yyyy
    [6] => asdf
)

Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!

  • 写回答

3条回答 默认 最新

  • duanbing6955 2015-10-09 16:53
    关注

    Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):

    $pattern = <<<'EOD'
    ~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
    EOD;
    
    if (preg_match_all($pattern, $str, $m))
        $result = $m[0];
    

    pattern details:

    ~                    # pattern delimiter
    
    (?=\S)               # the lookahead assertion only succeeds if there is a non-
                         # white-space character at the current position.
                         # (This lookahead is useful for two reasons:
                         #    - it allows the regex engine to quickly find the start of
                         #      the next item without to have to test each branch of the
                         #      following alternation at each position in the strings
                         #      until one succeeds.
                         #    - it ensures that there's at least one non-white-space.
                         #      Without it, the pattern may match an empty string.
                         # )
    
    [^'"\s]*          #"'# all that is not a quote or a white-space
    
    (?:                  # eventual quoted parts
        '[^']*' [^'"\s]*  #"# single quotes
      |
        "[^"]*" [^'"\s]*    # double quotes
    )*
    ~
    

    demo

    Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:

    ~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
    

    but it's a little less efficient.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 请教一个关于镜头标定,棋盘格格子大小的问题(畸变测试)
  • ¥15 el-table输入多维数组怎么实现
  • ¥15 安装GroudingDINO RuntimeError: Error compiling objects for extension
  • ¥15 关于推送项目到github的问题
  • ¥15 急!C++指针编写相关的问题
  • ¥15 kerberos身份认证配置问题
  • ¥30 用python写一个多签情况下波场的代理资源和回收资源
  • ¥15 怎么在matlab中输出显示泵的流量-扬程和管路损失与流量均在一个表格里
  • ¥15 matlab学期例题代码答疑
  • ¥15 在线手电筒追加按钮JS