dsgwii4867 2016-04-30 12:29
浏览 30
已采纳

正则表达式价格<p>标签块与PHP的重复[重复]

This question already has an answer here:

I am trying to scrape the prices block out of a webpage and I want to match the contents between the opening and closing paragraph tags which have the prices in. However the problem is in the html output source this is spit onto multiple lines with multiple white spaces. Here is a sample of the output http://pastebin.com/hfeuHqTN

I am trying to use:

$pricesClass = '/<p class="price-wrap">
(.*)/';

preg_match_all($pricesClass, $page, $pricesMatches);

How can I match the whole of the paragraph with the class of price-wrap until the closing paragraph tag?

At the moment it just matches the first two lines up to:

<p class="price-wrap"><strong class="product-price" itemprop="price">

I would like to match the whole thing e.g.

 <p class="price-wrap"><strong class="product-price" itemprop="price"> £120</strong> was&nbsp;<del>£186.00</del></p>
</div>
  • 写回答

1条回答 默认 最新

  • dpwh11290 2016-04-30 12:40
    关注

    Use a proper HTML parser like DOMDocument and preg_replace (\s+) only to remove the “whitespace characters” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed)

    $dom = new DOMDocument();
    $dom->loadHTML(file_get_contents("http://thesite.com");
    $xpath = new DOMXpath($dom);
    foreach ($xpath->query("//p[@class='price-wrap']") as $pText){
        echo preg_replace("/\s+/", "", $pText->textContent);
    }
    

    Ideone Demo

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统