douxi2011 2015-10-09 14:52
浏览 754
已采纳

用空格和冒号分割字符串,但如果在引号内则不分割

having a string like this:

$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"

the desired result is:

[0] => Array (
    [0] => dateto:'2015-10-07 15:05'
    [1] => xxxx
    [2] => datefrom:'2015-10-09 15:05'
    [3] => yyyy
    [4] => asdf
)

what I get with:

preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);

is:

[0] => Array (
    [0] => dateto:'2015-10-07
    [1] => 15:05'
    [2] => xxxx
    [3] => datefrom:'2015-10-09
    [4] => 15:05'
    [5] => yyyy
    [6] => asdf
)

Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!

  • 写回答

3条回答 默认 最新

  • duanbing6955 2015-10-09 16:53
    关注

    Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):

    $pattern = <<<'EOD'
    ~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
    EOD;
    
    if (preg_match_all($pattern, $str, $m))
        $result = $m[0];
    

    pattern details:

    ~                    # pattern delimiter
    
    (?=\S)               # the lookahead assertion only succeeds if there is a non-
                         # white-space character at the current position.
                         # (This lookahead is useful for two reasons:
                         #    - it allows the regex engine to quickly find the start of
                         #      the next item without to have to test each branch of the
                         #      following alternation at each position in the strings
                         #      until one succeeds.
                         #    - it ensures that there's at least one non-white-space.
                         #      Without it, the pattern may match an empty string.
                         # )
    
    [^'"\s]*          #"'# all that is not a quote or a white-space
    
    (?:                  # eventual quoted parts
        '[^']*' [^'"\s]*  #"# single quotes
      |
        "[^"]*" [^'"\s]*    # double quotes
    )*
    ~
    

    demo

    Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:

    ~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
    

    but it's a little less efficient.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 MCNP里如何定义多个源?
  • ¥20 双层网络上信息-疾病传播
  • ¥50 paddlepaddle pinn
  • ¥20 idea运行测试代码报错问题
  • ¥15 网络监控:网络故障告警通知
  • ¥15 django项目运行报编码错误
  • ¥15 请问这个是什么意思?
  • ¥15 STM32驱动继电器
  • ¥15 Windows server update services
  • ¥15 关于#c语言#的问题:我现在在做一个墨水屏设计,2.9英寸的小屏怎么换4.2英寸大屏