dsgwii4867 2016-04-30 12:29
浏览 30
已采纳

正则表达式价格<p>标签块与PHP的重复[重复]

This question already has an answer here:

I am trying to scrape the prices block out of a webpage and I want to match the contents between the opening and closing paragraph tags which have the prices in. However the problem is in the html output source this is spit onto multiple lines with multiple white spaces. Here is a sample of the output http://pastebin.com/hfeuHqTN

I am trying to use:

$pricesClass = '/<p class="price-wrap">
(.*)/';

preg_match_all($pricesClass, $page, $pricesMatches);

How can I match the whole of the paragraph with the class of price-wrap until the closing paragraph tag?

At the moment it just matches the first two lines up to:

<p class="price-wrap"><strong class="product-price" itemprop="price">

I would like to match the whole thing e.g.

 <p class="price-wrap"><strong class="product-price" itemprop="price"> £120</strong> was&nbsp;<del>£186.00</del></p>
</div>
  • 写回答

1条回答 默认 最新

  • dpwh11290 2016-04-30 12:40
    关注

    Use a proper HTML parser like DOMDocument and preg_replace (\s+) only to remove the “whitespace characters” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed)

    $dom = new DOMDocument();
    $dom->loadHTML(file_get_contents("http://thesite.com");
    $xpath = new DOMXpath($dom);
    foreach ($xpath->query("//p[@class='price-wrap']") as $pText){
        echo preg_replace("/\s+/", "", $pText->textContent);
    }
    

    Ideone Demo

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 unity第一人称射击小游戏,有demo,在原脚本的基础上进行修改以达到要求
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染