duanju8308 2014-11-30 05:43
浏览 27
已采纳

解析锚标记的html文档

say i have

<a href="www.myurl/point.html" class="l" style="color:#436DBA;" onclick="return rs(this,'8 Stunning Linguistic Miracles of The Holy Quran | Kinetic Typography 144p (Video Only).mp4');">&raquo; Download MP4 &laquo;</a> - <b>144p (Video Only)</b> - <span> 19.1</span> MB<br />

html page like this i wanna parse it with simple dom php parser and i wanna get download mp4 114p 19.1 as out put while i tried this code

foreach($displaybody->find('a ') as $element) {
       // echo $element->innertext . '<br/>';

it returned me download mp4 only how do i parse remaining values download mp4 114p 19.1 please help me out

  • 写回答

1条回答 默认 最新

  • dsgdhtr_43654 2014-11-30 05:48
    关注

    You can't use the <a> tag anymore since some of the text you're trying to access isn't inside it anymore, target the document itself and then use ->plaintext:

    $html = <<<EOT
    <a href="www.myurl/point.html" class="l" style="color:#436DBA;" onclick="return rs(this,'8 Stunning Linguistic Miracles of The Holy Quran | Kinetic Typography 144p (Video Only).mp4');">&raquo; Download MP4 &laquo;</a> - <b>144p (Video Only)</b> - <span> 19.1</span> MB<br />
    EOT;
    
    $displaybody = str_get_html($html);
    echo $displaybody->plaintext;
    

    Here is another way of accessing each row thru DOMDocument with xpath:

    // load the sites html page in DOMDocument
    $dom = new DOMDocument();
    libxml_use_internal_errors(true);
    $html_page = file_get_contents('http://www.mohammediatechnologies.in/download/downloadtest.php?name=8KPEiGqDQHg');
    $dom->loadHTML(mb_convert_encoding($html_page, 'HTML-ENTITIES', 'UTF-8'));
    libxml_clear_errors();
    $xpath = new DOMXpath($dom);
    
    $data = array();
    // target elements which is inside an anchor and a line break (treat them as each row)
    $links = $xpath->query('//*[following-sibling::a and preceding-sibling::br]');
    
    $temp = '';
    foreach($links as $link) { // for each rows of the link
    
        $temp .= $link->textContent . ' '; // get all text contents
    
        if($link->tagName == 'br') {
            $unit = $xpath->evaluate('string(./preceding-sibling::text()[1])', $link);
            $data[] = $temp . $unit; // push them inside an array
            $temp = '';
        }
    }
    
    echo '<pre>';
    print_r($data);
    

    Sample Output

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 在获取boss直聘的聊天的时候只能获取到前40条聊天数据
  • ¥20 关于URL获取的参数,无法执行二选一查询
  • ¥15 液位控制,当液位超过高限时常开触点59闭合,直到液位低于低限时,断开
  • ¥15 marlin编译错误,如何解决?
  • ¥15 有偿四位数,节约算法和扫描算法
  • ¥15 VUE项目怎么运行,系统打不开
  • ¥50 pointpillars等目标检测算法怎么融合注意力机制
  • ¥20 Vs code Mac系统 PHP Debug调试环境配置
  • ¥60 大一项目课,微信小程序
  • ¥15 求视频摘要youtube和ovp数据集