douquanhui5735 2012-03-27 20:10
浏览 43
已采纳

如何使用PHP从HTML文档中仅提取某些标签?

I'm using a crawler to retrieve the HTML content of certain pages on the web. I currently have the entire HTML stored in a single PHP variable:

$string = "<PRE>".htmlspecialchars($crawler->results)."</PRE>
";

What I want to do is select all "p" tags (for example) and store their in an array. What is the proper way to do that?

I've tried the following, by using xpath, but it doesn't show anything (most probably because the document itself isn't an XML, I just copy-pasted the example given in its documentation).

$xml = new SimpleXMLElement ($string);

    $result=$xml->xpath('/p');
    while(list( , $node)=each($result)){
        echo '/p: ' , $node, "
"; 
    }

Hopefully someone with (a lot) more experience in PHP will be able to help me out :D

  • 写回答

3条回答 默认 最新

  • douxu5233 2012-03-27 21:56
    关注

    Check out Simple HTML Dom. It will grab external pages and process them with fairly accurate detail.

    http://simplehtmldom.sourceforge.net/

    It can be used like this:

    // Create DOM from URL or file
    $html = file_get_html('http://www.google.com/');
    
    // Find all images
    foreach($html->find('img') as $element)
       echo $element->src . '<br>';
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 我想在一个软件里添加一个优惠弹窗,应该怎么写代码
  • ¥15 fluent的在模拟压强时使用希望得到一些建议
  • ¥15 STM32驱动继电器
  • ¥15 Windows server update services
  • ¥15 关于#c语言#的问题:我现在在做一个墨水屏设计,2.9英寸的小屏怎么换4.2英寸大屏
  • ¥15 模糊pid与pid仿真结果几乎一样
  • ¥15 java的GUI的运用
  • ¥15 Web.config连不上数据库
  • ¥15 我想付费需要AKM公司DSP开发资料及相关开发。
  • ¥15 怎么配置广告联盟瀑布流