dongyan3018 2015-01-25 04:11
浏览 77
已采纳

正则表达式:如何提取HTML标题标签[重复]

This question already has an answer here:

Extract all heading tags (h1, h2, h3, ...) and it's content. For example :

<h1 id="title">This is the title</h1>
<h2 id="subtitle">This is the subtitle</h2>
<p>And this is the paragraph</p>

Will be extracted as :

<h1 id="title">This is the title</h1> and <h2 id="subtitle">This is the subtitle</h2>

I'm using PHP and using regex as the title say.

</div>
  • 写回答

1条回答 默认 最新

  • dongye1143 2015-01-25 04:24
    关注

    It is recommended to use the right tool for the task.

    $doc = DOMDocument::loadHTML('
        <h1 id="title">This is the title</h1>
        <h2 id="subtitle">This is the subtitle</h2>
        <p>And this is the paragraph</p>
        <p>another tag</p>
    ');
    
    $xpath = new DOMXPath($doc);  
    $heads = $xpath->query('//h1|//h2|//h3|//h4|//h5|//h6');
    
    foreach ($heads as $tag) {
       echo $doc->saveHTML($tag), "
    ";
    }
    

    Output

    <h1 id="title">This is the title</h1>
    <h2 id="subtitle">This is the subtitle</h2>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 划分vlan后不通了
  • ¥15 GDI处理通道视频时总是带有白色锯齿
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)
  • ¥15 自适应 AR 模型 参数估计Matlab程序
  • ¥100 角动量包络面如何用MATLAB绘制
  • ¥15 merge函数占用内存过大
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大