dongzhouhao4316 2013-02-02 02:31
浏览 17
已采纳

从html中提取图像元素

I am trying to get the image tag out of html codes.

I have

   $parser=new DOMDocument;   

   $parser->loadHTML($this->html);
        foreach($parser->getElementsByTagName('img') as $imgNode){
         echo $parser->saveHTML($imgNode);
       }

$this->html contains massive html code and javascripts.

for example:

<div id='someid'>
<button id='bt' onclick='clickme()'>click me</button>
<img src='test.jpg'/>
.....
.....
more...

</div>

<div>
.....
.....
more...

I got an warning saying

DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity,

I am not sure how to fix this and don't know if there are a better way to extract all the images from the massive html codes.

Any ideas? Thanks a lot!

  • 写回答

1条回答 默认 最新

  • dongpai2468 2013-02-02 02:55
    关注

    I am in no way an expert on these matters (yet), but I hope this helps in some way.

    According to this answer by troelskn you can make the DOM parser more tolerant to badly formed HTML by using libxml_use_internal_errors. That might help you getting rid of that error.

    Parsing all images of a document can be done by using DOMXPath. It takes a DOMDocument as a parameter and lets you run XPath queries on the document.

    $document = new DOMDocument();
    $document->loadHTML($your_html);
    
    // Suppress parse errors.
    libxml_use_internal_errors(false);
    
    $xpath = new DOMXPath($document)
    
    // Find all img tags.
    $img_nodes = $xpath->query('//img')
    

    DOMXPath::query returns a DOMNodeList which can be looped through using DOMNodeList::item, which returns a DOMNode.

    for($i = 0; $i > $img_nodes->length; $i++)
    {
        $node = $img_nodes->item($i);
        // Manipulate the node.
    }
    

    Disclaimer: The code I posted is untested and was put together using the manual.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP