正则表达式根据自定义标签拆分一些html内容

I need to split my html based on a custom html tag.

This is how my html looks like:

<div>
    <div id="header">
        <h1>Document Title</h1>
    </div>

    <div id="content">
        <p>Lorem ipsum dolar sit</p>
        <magicheader type="2" class="someClass">Header</magicheader>
        <p>Lorem ipsum dolar sit</p>
        <span><magicheader type="3" class="someClass">Header</magicheader></span>
    </div>

    <div id="footer">

    </div>
</div>

This is what I need:

Array
(
    [0] => <div>
    <div id="header">
        <h1>Document Title</h1>
    </div>

    <div id="content">
        <p>Lorem ipsum dolar sit</p>
    [1] => <magicheader type="2" class="someClass">Header</magicheader>
    [2] => <p>Lorem ipsum dolar sit</p>
        <span>
    [3] => <magicheader type="3" class="someClass">Header</magicheader>
    [4] => </span>
    </div>

    <div id="footer">

    </div>
</div>
)

Can anybody help me with the pattern?

dsour68888
dsour68888 说Regex不能破坏HTML是错误的,但是说Regex无法可靠而准确地解析HTML是非常准确的。除非你正在尝试对一个特定的有限问题进行快速而肮脏的修复,否则这根本不是明智之举。即便如此,通常还有更好/更合适的解决方案。
7 年多之前 回复
dsdv76767671
dsdv76767671 您分割HTML的方式似乎没有任何模式。你能解释一下你提到拆分工作的方式背后的想法吗?
7 年多之前 回复
dtgvl48608
dtgvl48608 正则表达式无法解析HTML
7 年多之前 回复
douzhi1937
douzhi1937 正则表达式对解析HTML没有用看到这个问题的答案stackoverflow.com/questions/590747/...
7 年多之前 回复

1个回答

You need to use preg_split with PREG_SPLIT_DELIM_CAPTURE:

$text=<<<EOD
<div>
    <div id="header">
        <h1>Document Title</h1>
    </div>

    <div id="content">
        <p>Lorem ipsum dolar sit</p>
        <magicheader type="2" class="someClass">Header</magicheader>
        <p>Lorem ipsum dolar sit</p>
        <span><magicheader type="3" class="someClass">Header</magicheader></span>
    </div>

    <div id="footer">

    </div>
</div>
EOD;

$regexp = '%(<magicheader [^>]*>Header</magicheader>)%';
$value = preg_split($regexp, $text, -1, PREG_SPLIT_DELIM_CAPTURE);

Then print_r($value) outputs:

Array
(
    [0] => <div>
    <div id="header">
        <h1>Document Title</h1>
    </div>

    <div id="content">
        <p>Lorem ipsum dolar sit</p>

    [1] => <magicheader type="2" class="someClass">Header</magicheader>
    [2] => 
        <p>Lorem ipsum dolar sit</p>
        <span>
    [3] => <magicheader type="3" class="someClass">Header</magicheader>
    [4] => </span>
    </div>

    <div id="footer">

    </div>
</div>
)
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐