dongzhouhao4316 2013-02-02 02:31
浏览 17
已采纳

从html中提取图像元素

I am trying to get the image tag out of html codes.

I have

   $parser=new DOMDocument;   

   $parser->loadHTML($this->html);
        foreach($parser->getElementsByTagName('img') as $imgNode){
         echo $parser->saveHTML($imgNode);
       }

$this->html contains massive html code and javascripts.

for example:

<div id='someid'>
<button id='bt' onclick='clickme()'>click me</button>
<img src='test.jpg'/>
.....
.....
more...

</div>

<div>
.....
.....
more...

I got an warning saying

DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,

I am not sure how to fix this and don't know if there are a better way to extract all the images from the massive html codes.

Any ideas? Thanks a lot!

  • 写回答

1条回答 默认 最新

  • dongpai2468 2013-02-02 02:55
    关注

    I am in no way an expert on these matters (yet), but I hope this helps in some way.

    According to this answer by troelskn you can make the DOM parser more tolerant to badly formed HTML by using libxml_use_internal_errors. That might help you getting rid of that error.

    Parsing all images of a document can be done by using DOMXPath. It takes a DOMDocument as a parameter and lets you run XPath queries on the document.

    $document = new DOMDocument();
    $document->loadHTML($your_html);
    
    // Suppress parse errors.
    libxml_use_internal_errors(false);
    
    $xpath = new DOMXPath($document)
    
    // Find all img tags.
    $img_nodes = $xpath->query('//img')
    

    DOMXPath::query returns a DOMNodeList which can be looped through using DOMNodeList::item, which returns a DOMNode.

    for($i = 0; $i > $img_nodes->length; $i++)
    {
        $node = $img_nodes->item($i);
        // Manipulate the node.
    }
    

    Disclaimer: The code I posted is untested and was put together using the manual.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 怎样才能让鼠标沿着线条的中心线轨迹移动
  • ¥60 用visual studio编写程序,利用间接平差求解水准网
  • ¥15 Llama如何调用shell或者Python
  • ¥20 谁能帮我挨个解读这个php语言编的代码什么意思?
  • ¥15 win10权限管理,限制普通用户使用删除功能
  • ¥15 minnio内存占用过大,内存没被回收(Windows环境)
  • ¥65 抖音咸鱼付款链接转码支付宝
  • ¥15 ubuntu22.04上安装ursim-3.15.8.106339遇到的问题
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?