double0201 2018-10-19 05:17
浏览 86
已采纳

PHP preg_split将分隔符保存在不同的元素中

I'm trying to split a string into an array of parts.

String Example...

The quick brown fox [[random text here]] and then [[a different text here]]

Text between the square brackets will change and cannot be determined ahead of time. The preg_split I have so far will split, but it places the delimiters in other elements in the produced array, not the element I want it to be in.

$page_widget_split = preg_split('@(?<=\[\[)(.*?)(?=\]\])@', $page_content,-1, PREG_SPLIT_DELIM_CAPTURE);

This produces something like this...

[0] => "The quick brown fox [[",
[1] => "random text here]]",
[2] => " and then [[",
[3] => "a different text here]]"

The desired result would look like this...

[0] => "The quick brown fox",
[1] => "[[random text here]]",
[2] => " and then ",
[3] => "[[a different text here]]"

As I'm far from understanding Regex, could someone please take a look and tell me what I'm missing in the regex ?

  • 写回答

2条回答 默认 最新

  • dongling2545 2018-10-19 05:24
    关注

    This will get you pretty close

     $page_content = 'the quick brown fox [[random text here]] and then [[a different text here]]';
    
     print_r(preg_split('/(\[\[[^\]]+\]\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
    

    The thing to remember is that this is the delimiter (\[\[[^\]]+\]\])

    Output:

    Array
    (
        [0] => the quick brown fox 
        [1] => [[random text here]]
        [2] =>  and then 
        [3] => [[a different text here]]
    )
    

    Sandbox

    When i say pretty close, I do mean really pretty close...

    The regex is pretty straight forward, capture 2 [ then anything but a ] then 2 of those ]. Which makes our delimiter, which we then capture. No empty flag is nice too.

    Enjoy!

    UPDATE

    but it fails on " here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text"...Note the "[]" under the 'columns'

    To handle that you will need a recursive regex pattern using (?R), like this:

    $page_content = 'here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text [someother bracket]';
    
    print_r(preg_split('/(\[(?:[^\[\]]|(?R))*\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
    

    Output:

    Array
    (
        [0] => here is my table 
        [1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
        [2] =>  and this is more text 
        [3] => [someother bracket] //single bracket capture
    )
    

    Sandbox

    I won't pretend, this is kind of at the edge of my knowledge of regex, I should note this matches single brackets and not specifically double ones. You could try something like this /(\[(\[(?:[^\[\]]|(?2))*\])\])/ the (?2) is like (?R) but for a specific capture group. Which this works to match only [[ ... ]] while keeping the inner nesting. But the issue is, then you have the capture duplicated, so you wind up with this:

    Array
    (
        [0] => here is my table 
        [1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
        [2] => [{"widget":"table","id":"1","title": "Views Table", "columns": []}]
        [3] =>  and this is more text [someother bracket]
    )
    

    Notice how it doesn't capture [someother bracket], but it captures the other one 2 times. There may be a way around that, but i can't think of it.

    Rather or not capturing single bracket pairs is an issue I don't know.

    But I have used this before, mainly for matching, matched pairs of " or ( ) but it's the same concept.

    The only other solution would be to make a lexer/parser for it, I have some examples of how do do that on my GitHub account. Regex (by itself) is not suited to nested elements. Most any regex solution will fail on nesting.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 腾讯企业邮箱邮件可以恢复么
  • ¥15 有人知道怎么将自己的迁移策略布到edgecloudsim上使用吗?
  • ¥15 错误 LNK2001 无法解析的外部符号
  • ¥50 安装pyaudiokits失败
  • ¥15 计组这些题应该咋做呀
  • ¥60 更换迈创SOL6M4AE卡的时候,驱动要重新装才能使用,怎么解决?
  • ¥15 让node服务器有自动加载文件的功能
  • ¥15 jmeter脚本回放有的是对的有的是错的
  • ¥15 r语言蛋白组学相关问题
  • ¥15 Python时间序列如何拟合疏系数模型