dtpoius74857 2015-08-21 11:13
浏览 49

通过正则表达式选择包含超过1对标记的HTML

<div class="apple">

    <a href="..." > ... </a>

    <div class="boy">
        (some content here)
    </div>

    <div class="cat">
        <b>Text One.</b> <br> <i>Text Two.</i>
    </div>

    <div class="dog">
        <b>Text One.</b> <br> <i>Text Two.</i>
    </div>

</div>

.
. (and there are couple more structure with cat class inside but not necessarily under the class apple)
.

<div class="zoo">
.
    <div class="cat">
        <b>Text One.</b> <br> <i>Text Two.</i>
    </div>
.
</div>
.
.
.

I am working with PHP. I want to know that how to select exactly "Text One." only from the div class="cat" the under div class="apple" out of the html (but not from any other).

Currnetly I am doing something like this:

$html=file_get_contents('xxx.html');

$a=preg_match_all("/\<div class\=\"apple\"(.*)\<div class\=\"cat\"\>(.*)<\/b\>/s",$html,$b);

foreach ($b[1] as $value) {
    echo strip_tags("$value");
}

I just found it online, it may be possible but not be the best choice to due with the situation.

Many irrelevant content were also selected (i got everything within the last tag and more content than i want in )

please suggest me the appropriate regular expression or a better way to solve.

  • 写回答

1条回答 默认 最新

  • doudong3570 2015-08-21 11:27
    关注

    Since you mention a better way, I would suggest going with the simple html dom library, http://simplehtmldom.sourceforge.net.

    In your example you would use it like this:

    <?php
    
    include 'simple_html_dom.php';
    
    $html = str_get_html('<div class="apple">
    
        <a href="..." > ... </a>
    
        <div class="boy">
            (some content here)
        </div>
    
        <div class="cat">
            <b>Text One.</b> <br> <i>Text Two.</i>
        </div>
    
        <div class="dog">
            <b>Text One.</b> <br> <i>Text Two.</i>
        </div>
    
    </div>
    
    .
    . (and there are couple more <div class="apple"> structure with cat class inside)
    .
    
    <div class="apple">
    .
    .
    .
    </div>
    .
    .
    .');
    
    $text = $html->find('div.cat b',0)->innertext;
    
    print $text . PHP_EOL;
    
    // it will print this
    // Text One.
    
    评论

报告相同问题?

悬赏问题

  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 AT89C51控制8位八段数码管显示时钟。
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题