douyou7797 2011-02-04 21:19
浏览 39
已采纳

使用PHP DOMDocument连接HTML表

I have a whole bunch of large HTML documents with tables of data inside and I'm looking to write a script which can process an HTML file, isolate the tags and their contents, then concatenate all the rows within those tables into one large data table. Then loop through the rows and columns of the new large table.

After some research I've started trying out PHP's DOMDocument class to parse the HTML but I just wanted to know, is that the best way to do something like this?

This is what I've got so far...

$dom = new DOMDocument();
$dom->preserveWhiteSpace = FALSE;
@$dom->loadHTMLFile('exrate.html');
$tables = $dom->getElementsByTagName('table');

How do I chop out everything other than the tables and their contents? Then I'd actually like to remove the first table since it's a table of contents. Then loop through all the table rows and build them into one large table.

Anyone got any hints on how to do this? I've been digging through the docs for DOMDocument on php.net but I'm finding the syntax pretty baffling!

Cheers, B

EDIT: Here is a sample of an HTML file with the data tables I'd like to join http://thenetzone.co.uk/exrates/exrate.html

  • 写回答

1条回答 默认 最新

  • duandian8110 2011-02-05 12:47
    关注

    Ok got it sorted with phpQuery and lots of trial and error.
    So it takes a whole bunch of tables and moves the contents into the first one, removes the empty tables.
    Then loops through each table row and extracts the text from specific columns, in this case the 2nd and 3rd td of each row.

    require('phpQuery/phpQuery.php');
    $doc = phpQuery::newDocumentFileHTML('exrates_code.html');
    pq('table:first')->remove();// REMOVE FIRST TABLE, JUST A CONTENTS TABLE SO NOT INTERESTED
    pq('tr:has(th)')->remove();// REMOVE TABLE ROWS THAT ARE HEADERS
    pq('table:not(:first) tr')->appendTo('table:first');// MOVE CONTENTS OF OTHER TABLES TO FIRST
    pq('table:empty')->remove();// REMOVE EMPTY TABLES
    pq('br')->remove();
    
    $rows = pq('table tr');
    foreach ($rows as $row) {
        $currency = pq($row)->find('td:eq(1)')->text();
        $value = pq($row)->find('td:eq(2)')->text();
    }

    Hope this helps someone out!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 求解vmware的网络模式问题
  • ¥24 EFS加密后,在同一台电脑解密出错,证书界面找不到对应指纹的证书,未备份证书,求在原电脑解密的方法,可行即采纳
  • ¥15 springboot 3.0 实现Security 6.x版本集成
  • ¥15 PHP-8.1 镜像无法用dockerfile里的CMD命令启动 只能进入容器启动,如何解决?(操作系统-ubuntu)
  • ¥30 请帮我解决一下下面六个代码
  • ¥15 关于资源监视工具的e-care有知道的嘛
  • ¥35 MIMO天线稀疏阵列排布问题
  • ¥60 用visual studio编写程序,利用间接平差求解水准网
  • ¥15 Llama如何调用shell或者Python
  • ¥20 谁能帮我挨个解读这个php语言编的代码什么意思?