douwen4178 2011-08-08 23:48
浏览 42
已采纳

解析页面源以检索表数据,然后导出到xls

I have a need to dump the source of a page into a form, and have it spit out an xls file containing the contents of the page's tables.

the page I wish to parse has several tables on it, of varying rows and 11 columns. Each table has a header, which I don't need. I have researched using DOM, but I couldn't figure out a way to use that object for my application. I thought about using preg_replace() as well, but again, since I am dealing with source code, I think that that wont work.

Once I get the parse portion correct, I know how to write it to a xls file in php. I just cannot figure out how to go about this in php. Thanks in advance.

If it helps, this is what the table structure looks like for each table.

<table>
  <thead>
      <tr>
        <td>
        </td>
      </tr>
  </thead>
  <tbody>
      <tr>
        <td>
       </td>
     </tr>
 </tbody>
</table>
  • 写回答

1条回答 默认 最新

  • dongyong8098 2011-08-09 01:10
    关注

    This should get you started at least

    $doc = new DOMDocument();
    $doc->loadHTML($htmlString);
    
    // Get all tables bodies
    $tables = $doc->getElementsByTagName('tbody');
    
    foreach ($tables as $table) {
        $rows = $table->getElementsByTagName('tr');
        foreach ($rows as $row) {
            $cells = $row->getElementsByTagName('td');
            foreach ($cells as $cell) {
                $textContent = $cell->nodeValue;
            }
        }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度