doucong7963 2019-02-27 07:51
浏览 103
已采纳

Xpath循环问题,用于将简单的HTML表解析为php数组

So my previous question: PHP Convert html table to JSON was quickly dismissed as a duplicate and I'm still struggling to get to what I need. I think it's mostly a logic problem in the loops and I need someone else to take a look at it.

Given this table as an example:

<table id="Details" class="DATA_TABLE DATA_TABLE_WO_TOTAL">
  <tr>
    <th>Application</th>
    <th>Version number</th>
    <th>Virtual Administration Server</th>
    <th>Group</th>
    <th>Device</th>
    <th>Installed</th>
    <th>Last visible time</th>
    <th>Last connection to Administration Server</th>
    <th>IP address</th>
  </tr>
  <tr>
    <td class="sD">some text</td>
    <td class="sD">10.2.5.3201</td>
    <td class="sD"></td>
    <td class="sD">Thin PC</td>
    <td class="sD">PC#</td>
    <td class="sD">date</td>
    <td class="sD">date</td>
    <td class="sD">date</td>
    <td class="sD">ip address</td>
  </tr>
  <tr>
     <tr>
    <td class="sD">some more text</td>
    <td class="sD">10.2.5.3201</td>
    <td class="sD"></td>
    <td class="sD">Thin PC</td>
    <td class="sD">PC#</td>
    <td class="sD">date</td>
    <td class="sD">date</td>
    <td class="sD">date</td>
    <td class="sD">ip address</td>
  </tr>
</table>

I need to create an array (that I can later convert to a json) where the th tags are the keys and then all the td tags inside each other tr is the data corresponding to these keys. I have the following php code:

<?php
$dom = new DOMDocument;
$dom->loadHTML($cleantable2); //this is the table above
$xpath = new DOMXPath($dom);

foreach($xpath->query('//table/tr') as $tr){
        $tmp = [];
                foreach($xpath->query('//table/tr/th', $tr) as $th){
                        $key = $th->textContent;
                        foreach($xpath->query('td', $tr) as $td){
                                $tmp[$key] = trim($td->textContent);
                        }
                }
                $result[]=$tmp;
        }
var_dump($result);

?>

It does get the keys right, but not the data, sample output:

  [89]=>
  array(9) {
    ["Application"]=>
    string(13) "192.168.6.104"
    ["Version number"]=>
    string(13) "192.168.6.104"
    ["Virtual Administration Server"]=>
    string(13) "192.168.6.104"
    ["Group"]=>
    string(13) "192.168.6.104"
    ["Device"]=>
    string(13) "192.168.6.104"
    ["Installed"]=>
    string(13) "192.168.6.104"
    ["Last visible time"]=>
    string(13) "192.168.6.104"
    ["Last connection to Administration Server"]=>
    string(13) "192.168.6.104"
    ["IP address"]=>
    string(13) "192.168.6.104"
  }

As you can see, it only picks up the IP address for each key and not the rest of the data. What am I doing wrong? Can someone help out and not just dismiss this as a duplicate? Been trying to figure this out for over a day, I'm pretty sure my issue is just not looping correctly but I'm not seeing it...

Thanks

</div>
  • 写回答

1条回答 默认 最新

  • doumou3883 2019-02-27 08:41
    关注
    $strhtml='
    <table id="Details" class="DATA_TABLE DATA_TABLE_WO_TOTAL">
      <tr>
        <th>Application</th>
        <th>Version number</th>
        <th>Virtual Administration Server</th>
        <th>Group</th>
        <th>Device</th>
        <th>Installed</th>
        <th>Last visible time</th>
        <th>Last connection to Administration Server</th>
        <th>IP address</th>
      </tr>
      <tr>
        <td class="sD">some text</td>
        <td class="sD">10.2.5.202</td>
        <td class="sD">Plato</td>
        <td class="sD">Thin PC</td>
        <td class="sD">PC#</td>
        <td class="sD">date a</td>
        <td class="sD">date b</td>
        <td class="sD">date c</td>
        <td class="sD">10.25.100.1</td>
      </tr>
      <tr>
         <tr>
        <td class="sD">some more text</td>
        <td class="sD">10.2.5.321</td>
        <td class="sD">Socrates</td>
        <td class="sD">Thick PC</td>
        <td class="sD">PC#</td>
        <td class="sD">date x</td>
        <td class="sD">date y</td>
        <td class="sD">date z</td>
        <td class="sD">10.25.100.2</td>
      </tr>
    </table>';
    

    Given the above html snippet perhaps the following does what you need? The comments should help see what I have done

    libxml_use_internal_errors( true );
    $dom=new DOMDocument;
    $dom->loadHTML( $strhtml );
    libxml_clear_errors();
    
    $xp=new DOMXPath( $dom );
    /* find the `th` elements */
    $col = $xp->query( '//tr/th' );
    
    /* temp arrays */
    $tmp=$out=$keys=array();
    
    
    if( $col->length > 0 ){
        /* get all headers as keys */
        foreach( $col as $node )$keys[]=$node->nodeValue;
    
        /* get all table cell data - store in single array */
        $col=$xp->query( '//tr/td[ @class="sD" ]' );
        foreach( $col as $node )$tmp[]=$node->nodeValue;
    
        /* split data into chunks according to number of columns */
        $rows=array_chunk( $tmp, count( $keys ) );
    
        /* combine keys and chunks */
        foreach( $rows as $row ){
            $tmp=array();
            foreach( $row as $i => $value ) $tmp[ $keys[ $i ] ]=$value;
            $out[]=$tmp;
        }
    
        echo json_encode( $out );
    }
    

    output:

    [
        {
            "Application":"some text",
            "Version number":"10.2.5.202",
            "Virtual Administration Server":"Plato",
            "Group":"Thin PC",
            "Device":"PC#",
            "Installed":"date a",
            "Last visible time":"date b",
            "Last connection to Administration Server":"date c",
            "IP address":"10.25.100.1"
        },
        {
            "Application":"some more text",
            "Version number":"10.2.5.321",
            "Virtual Administration Server":"Socrates",
            "Group":"Thick PC","Device":"PC#",
            "Installed":"date x",
            "Last visible time":"date y",
            "Last connection to Administration Server":"date z",
            "IP address":"10.25.100.2"
        }
    ]
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 易语言把MYSQL数据库中的数据添加至组合框
  • ¥20 求数据集和代码#有偿答复
  • ¥15 关于下拉菜单选项关联的问题
  • ¥20 java-OJ-健康体检
  • ¥15 rs485的上拉下拉,不会对a-b<-200mv有影响吗,就是接受时,对判断逻辑0有影响吗
  • ¥15 使用phpstudy在云服务器上搭建个人网站
  • ¥15 应该如何判断含间隙的曲柄摇杆机构,轴与轴承是否发生了碰撞?
  • ¥15 vue3+express部署到nginx
  • ¥20 搭建pt1000三线制高精度测温电路
  • ¥15 使用Jdk8自带的算法,和Jdk11自带的加密结果会一样吗,不一样的话有什么解决方案,Jdk不能升级的情况