duanhuantong8278 2011-07-24 05:54
浏览 28
已采纳

用正则表达式解析网页源代码

I can't seem to figure out the regular expression I need in order to parse the following.

<div id="MustBeInThisId">
   <div class="ValueFromThisClass">
      The Value I need
   </div>
</div>

As you can see I have a wrapping div with an id. That div contain multiple other divs but only one of those divs I need the value from.

  • 写回答

4条回答 默认 最新

  • douzi1991 2011-07-24 06:03
    关注

    If you are trying to extract some data from an HTML document, you should not use regular expressions.

    Instead, you should use a DOM Parser : those are made exactly for that.


    In PHP, you would use the DOMDocument class, and its DOMDocument::loadHTML() method, to load the HTML content.


    Then, you can work with methods such as :

    You can even work with DOMXpath to execute XPath queries on your HTML content -- which will allow you to search for pretty much anything in it.


    In your case, I suppose that something like this should do the trick.

    First, get your HTML content into a string (or use DOMDocument::loadHTMLFile()) :

    $html = <<<HTML
    <p>hello</p>
    <div>
        <div id="MustBeInThisId">
        <div class="ValueFromThisClass">
            The Value I need
        </div>
        </div>
    <div>
    HTML;
    

    Then, load it to a DOMDocument instance :

    $dom = new DOMDocument();
    $dom->loadHTML($html);
    

    Instanciate a DOMXPath object, and use it to query your DOM object :
    My XPath expression might be a bit more complex than necessary... I'm not really good with those...

    $xpath = new DOMXPath($dom);
    $items = $xpath->query('//div[@id="MustBeInThisId"]/div[@class="ValueFromThisClass"]');
    

    And, finally, work with the results of that query :

    if ($items->length > 0) {
        var_dump( trim( $items->item(0)->nodeValue ) );
    }
    

    And here is your result :

    string 'The Value I need' (length=16)
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 mmocr的训练错误,结果全为0
  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀