dongmou2389 2013-05-27 23:53
浏览 50
已采纳

如何使用xpath [PHP]批量解析HTML?

I tried all sorts of things but couldn't find a solution. I want to retrieve elements from html code using xpath in php.

Ex:

<div class='student'>
 <div class='name'>Michael</div>
 <div class='age'>26</div>
</div>
<div class='student'>
 <div class='name'>Joseph</div>
 <div class='age'>27</div>
</div>

I want to retrieve the information and put them in an array as follows:

$student[0][name] = Michael;
$student[0][age] = 26;
$student[1][name] = Joseph;
$student[1][age] = 27;`

In other words i want the matching ages to stay with the names.

I tried the following:

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpathDom = new DomXPath($dom);
$homepostcontentNodes = $xpathDom->query("//*[contains(@class, 'student')]//*[contains(@class, 'name')]");`

However, this is only grabbing me the nodes 'names' How can i get the matching age nodes?

  • 写回答

1条回答 默认 最新

  • douken0530 2013-05-28 00:06
    关注

    Of course it is only grabbing the nodes name - you are telling it to!

    What you will need to do is in two steps:

    1. Pick out all the student nodes
    2. For each student node, pick out the columns

    This is a pretty standard step in linearization of data, and the XPath queries are simple:

    Step 1

    You pretty much have it:

     $studentNodes = $xpathDom->query("//div[contains(@class, 'student')]");
    

    This will return all your student nodes.

    Step 2

    This is where the magic happens. We have our nodes, we can loop through them (DOMNodeList implements Iterator, so we can foreach-loop through them). What we need to figure out is how to find its children...

    ...Oh wait. DOMNode implements a method called getNodePath which returns the full, direct XPath path to the node. This allows us to then simply append /div to get all the div direct descendents to the node!

    Another quick foreach, and we get this code:

    $studentNodes = $xpathDom->query("//div[contains(@class, 'student')]");
    $result = array();
    foreach ($studentNodes as $v) {
    // Child nodes: student
    $r = array();
    $columns = $xpathDom->query($v->getNodePath()."/div");
    foreach ($columns as $v2) {
               // Attributes allows me to get the 'class' property of the node. Bit clunky, but there's no alternative
        $r[$v2->attributes->getNamedItem("class")->textContent] = $v2->textContent;
    }
    $result[] = $r;
    }
    var_dump($result);
    

    Full fiddle: http://codepad.viper-7.com/t868Wh

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 phython如何实现以下功能?查找同一用户名的消费金额合并—
  • ¥15 孟德尔随机化怎样画共定位分析图
  • ¥18 模拟电路问题解答有偿速度
  • ¥15 CST仿真别人的模型结果仿真结果S参数完全不对
  • ¥15 误删注册表文件致win10无法开启
  • ¥15 请问在阿里云服务器中怎么利用数据库制作网站
  • ¥60 ESP32怎么烧录自启动程序
  • ¥50 html2canvas超出滚动条不显示
  • ¥15 java业务性能问题求解(sql,业务设计相关)
  • ¥15 52810 尾椎c三个a 写蓝牙地址