dpbsy60000 2015-11-12 22:30
浏览 37
已采纳

在PHP中为内容刮取DOMDocument表

I am really struggling attempting to scrape a table either via XPath or any sort of 'getElement' method. I have searched around and attempted various different approaches to solve my problem below but have come up short and really appreciate any help.

First, the HTML portion I am trying to scrape is the 2nd table on the document and looks like:

<table class="table2" border="1" cellspacing="0" cellpadding="3">
<tbody>
<tr><th colspan="8" align="left">Status Information</th></tr>
<tr><th align="left">Status</th><th align="left">Type</th><th align="left">Address</th><th align="left">LP</th><th align="left">Agent Info</th><th align="left">Agent Email</th><th align="left">Phone</th><th align="center">Email Tmplt</th></tr>
<tr></tr>
<tr>
<td align="left">Active</td>
<td align="left">Resale</td>
<td align="center">*Property Address*</td>
<td align="right">*Price*</td>
<td align="center">*Agent Info*</td>
<td align="center">*Agent Email*</td>
<td align="center">*Agent Phone*</td>
<td align="center">&nbsp;</td>
</tr>
<tr>
<td align="left">Active</td>
<td align="left">Resale</td>
<td align="center">*Property Address*</td>
<td align="right">*Price*</td>
<td align="center">*Agent Info*</td>
<td align="center">*Agent Email*</td>
<td align="center">*Agent Phone*</td>
<td align="center">&nbsp;</td>
</tr>
...etc

With additional trs continuing containing 8 tds with the same information as detailed above.

What I need to do is iterate through the trs and internal tds to pick up each piece of information (inside the td) for each entry (inside of the tr).

Here is the code I have been struggling with:

<?php

$payload = array(
  'http'=>array(
     'method'=>"POST",
     'content'=>'key=value'
   )
);
stream_context_set_default($payload);
$dom = new DOMDocument();
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('website-scraping-from.com');
libxml_clear_errors();

foreach ($dom->getElementsByTagName('tr') as $row){
    foreach($dom->$row->getElementsByTagName('td') as $node){
        echo $node->textContent . "<br/>";
    }

}


?>

This code is not returning nearly what I need and I am having a lot of trouble trying to figure out how to fix it, perhaps XPath is a better route to go to find the table / information I need, but I have come up empty with that method as well. Any information is much appreciated.

If it matters, my end goal is to be able to take the table data and dump it into a database if the first td has a value of "Active".

  • 写回答

1条回答 默认 最新

  • duan1979768678 2015-11-12 22:51
    关注

    Can this be of any help?

    $table = $dom->getElementsByTagName('table')->item(1);
    foreach ($table->getElementsByTagName('tr') as $row){
        $cells = $row->getElementsByTagName('td');
        if ( $cells->item(0)->nodeValue == 'Active' ) {
            foreach($cells as $node){
                echo $node->nodeValue . "<br/>";
            }
        }
    }
    

    This will fetch the second table, and display the contents of the rows starting with a first cell "Active".

    Edit: Here is a more extensive help:

    $arr = array();
    $table = $dom->getElementsByTagName('table')->item(1);
    foreach ($table->getElementsByTagName('tr') as $row){
        $cells = $row->getElementsByTagName('td');
        if ( $cells->item(0)->nodeValue == 'Active' ) {
            $obj = new stdClass;
            $obj->type    = $cells->item(1)->nodeValue;
            $obj->address = $cells->item(2)->nodeValue;
            $obj->price   = $cells->item(3)->nodeValue;
            $obj->agent   = $cells->item(4)->nodeValue;
            $obj->email   = $cells->item(5)->nodeValue;
            $obj->phone   = $cells->item(6)->nodeValue;
            array_push( $arr, $obj );
        }
    }
    print_r( $arr );
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行
  • ¥20 测距传感器数据手册i2c