dongmou2389 2013-05-27 23:53
浏览 50
已采纳

如何使用xpath [PHP]批量解析HTML?

I tried all sorts of things but couldn't find a solution. I want to retrieve elements from html code using xpath in php.

Ex:

<div class='student'>
 <div class='name'>Michael</div>
 <div class='age'>26</div>
</div>
<div class='student'>
 <div class='name'>Joseph</div>
 <div class='age'>27</div>
</div>

I want to retrieve the information and put them in an array as follows:

$student[0][name] = Michael;
$student[0][age] = 26;
$student[1][name] = Joseph;
$student[1][age] = 27;`

In other words i want the matching ages to stay with the names.

I tried the following:

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpathDom = new DomXPath($dom);
$homepostcontentNodes = $xpathDom->query("//*[contains(@class, 'student')]//*[contains(@class, 'name')]");`

However, this is only grabbing me the nodes 'names' How can i get the matching age nodes?

  • 写回答

1条回答 默认 最新

  • douken0530 2013-05-28 00:06
    关注

    Of course it is only grabbing the nodes name - you are telling it to!

    What you will need to do is in two steps:

    1. Pick out all the student nodes
    2. For each student node, pick out the columns

    This is a pretty standard step in linearization of data, and the XPath queries are simple:

    Step 1

    You pretty much have it:

     $studentNodes = $xpathDom->query("//div[contains(@class, 'student')]");
    

    This will return all your student nodes.

    Step 2

    This is where the magic happens. We have our nodes, we can loop through them (DOMNodeList implements Iterator, so we can foreach-loop through them). What we need to figure out is how to find its children...

    ...Oh wait. DOMNode implements a method called getNodePath which returns the full, direct XPath path to the node. This allows us to then simply append /div to get all the div direct descendents to the node!

    Another quick foreach, and we get this code:

    $studentNodes = $xpathDom->query("//div[contains(@class, 'student')]");
    $result = array();
    foreach ($studentNodes as $v) {
    // Child nodes: student
    $r = array();
    $columns = $xpathDom->query($v->getNodePath()."/div");
    foreach ($columns as $v2) {
               // Attributes allows me to get the 'class' property of the node. Bit clunky, but there's no alternative
        $r[$v2->attributes->getNamedItem("class")->textContent] = $v2->textContent;
    }
    $result[] = $r;
    }
    var_dump($result);
    

    Full fiddle: http://codepad.viper-7.com/t868Wh

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 想问一下树莓派接上显示屏后出现如图所示画面,是什么问题导致的
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号