douquanhui5735 2012-03-27 12:10
浏览 43
已采纳

如何使用PHP从HTML文档中仅提取某些标签?

I'm using a crawler to retrieve the HTML content of certain pages on the web. I currently have the entire HTML stored in a single PHP variable:

$string = "<PRE>".htmlspecialchars($crawler->results)."</PRE>
";

What I want to do is select all "p" tags (for example) and store their in an array. What is the proper way to do that?

I've tried the following, by using xpath, but it doesn't show anything (most probably because the document itself isn't an XML, I just copy-pasted the example given in its documentation).

$xml = new SimpleXMLElement ($string);

    $result=$xml->xpath('/p');
    while(list( , $node)=each($result)){
        echo '/p: ' , $node, "
"; 
    }

Hopefully someone with (a lot) more experience in PHP will be able to help me out :D

  • 写回答

3条回答 默认 最新

  • douxu5233 2012-03-27 13:56
    关注

    Check out Simple HTML Dom. It will grab external pages and process them with fairly accurate detail.

    http://simplehtmldom.sourceforge.net/

    It can be used like this:

    // Create DOM from URL or file
    $html = file_get_html('http://www.google.com/');
    
    // Find all images
    foreach($html->find('img') as $element)
       echo $element->src . '<br>';
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)
编辑
预览

报告相同问题?