douya2982 2013-04-16 23:46
浏览 56
已采纳

使用XSLTProcessor从HTML中提取一个表

I am trying to get the contents of a table with the class sticky-enabled in XML format.

My PHP code is:

<?php

// Load the XML source
$xml = new DOMDocument;
$out = $xml->load("collection.html");

$xsl = new DOMDocument;
$xsl->load('collection.xsl');

// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); // attach the xsl rules

$xml = $proc->transformToXML($xml);

$xml = simplexml_load_string($xml);

print_r($xml);

?>

And the collection.html HTML is:

<table>
    <thead>
        <tr>
            <th>A</th>
        </tr>
        <tbody>
        <tr>
            <td>B</td>
        </tr>
        </tbody>
    </thead>
</table>

<table class="sticky-enabled">
 <thead><tr><th>Date</th><th>Time</th><th>Location</th><th>Tracking Event</th> </tr></thead>
<tbody>
 <tr class="odd"><td>16-04-2013</td><td>19:20</td><td>International Hub</td><td>Forwarded for export</td> </tr>
 <tr class="even"><td>16-04-2013</td><td>18:53</td><td>International Hub</td><td>Received and processed</td> </tr>
 <tr class="odd"><td>15-04-2013</td><td>17:28</td><td>Manchester Piccadilly Depot</td><td>Collected from customer</td> </tr>
 <tr class="even"><td>15-04-2013</td><td>00:00</td><td>WDM Online</td><td></td> </tr>
</tbody>
</table>

<table>
    <thead>
        <tr>
            <th>A</th>
        </tr>
        <tbody>
        <tr>
            <td>B</td>
        </tr>
        </tbody>
    </thead>
</table>

And finally collection.xsl is:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
  <output>
    <xsl:for-each select="table[@class='sticky-enabled']/tbody/tr">
      <tracking>
        <date><xsl:value-of select="td[1]" /></date>
        <time><xsl:value-of select="td[2]" /></time>
        <event><xsl:value-of select="td[3]" /></event>
        <extra><xsl:value-of select="td[4]" /></extra>        
      </tracking> 
    </xsl:for-each>
  </output>    
  </xsl:template>
</xsl:stylesheet>

If I run this then $xml is empty. If I edit collection.html and remove the first and last tables (i.e. just leaving the one I am trying to access) then it works. I suspect the problem is therefore with:

<xsl:for-each select="table[@class='sticky-enabled']/tbody/tr">
  • 写回答

1条回答 默认 最新

  • dpwqicw157673 2013-04-17 00:20
    关注

    Your "XML" is not well-formed. So, it can't be parsed and transformed with the XSLT. An XML document must have a single document element. You have three <table> elements that are siblings. Removing the other tables results in a well-formed XML file that can be transformed.

    Try wrapping the tables with an XML element.

    For example:

    <doc>
      <table>
        <thead>
            <tr>
                <th>A</th>
            </tr>
            <tbody>
            <tr>
                <td>B</td>
            </tr>
            </tbody>
        </thead>
    </table>
    
    <table class="sticky-enabled">
     <thead><tr><th>Date</th><th>Time</th><th>Location</th><th>Tracking Event</th> </tr></thead>
    <tbody>
     <tr class="odd"><td>16-04-2013</td><td>19:20</td><td>International Hub</td><td>Forwarded for export</td> </tr>
     <tr class="even"><td>16-04-2013</td><td>18:53</td><td>International Hub</td><td>Received and processed</td> </tr>
     <tr class="odd"><td>15-04-2013</td><td>17:28</td><td>Manchester Piccadilly Depot</td><td>Collected from customer</td> </tr>
     <tr class="even"><td>15-04-2013</td><td>00:00</td><td>WDM Online</td><td></td> </tr>
    </tbody>
    </table>
    
    <table>
        <thead>
            <tr>
                <th>A</th>
            </tr>
            <tbody>
            <tr>
                <td>B</td>
            </tr>
            </tbody>
        </thead>
      </table>
    <doc>
    

    Then adjust your stylesheet to account for the change to the structure, matching on the document element instead of root node:

    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
        <xsl:output indent="yes"/>
            <output>
                <xsl:for-each select="table[@class='sticky-enabled']/tbody/tr">
                    <tracking>
                        <date><xsl:value-of select="td[1]" /></date>
                        <time><xsl:value-of select="td[2]" /></time>
                        <event><xsl:value-of select="td[3]" /></event>
                        <extra><xsl:value-of select="td[4]" /></extra>        
                    </tracking> 
                </xsl:for-each>
            </output>    
        </xsl:template>
    </xsl:stylesheet>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥88 实在没有想法,需要个思路
  • ¥15 python中合并修改日期相同的CSV文件并按照修改日期的名字命名文件
  • ¥15 有赏,i卡绘世画不出
  • ¥15 如何用stata画出文献中常见的安慰剂检验图
  • ¥15 c语言链表结构体数据插入
  • ¥40 使用MATLAB解答线性代数问题
  • ¥15 COCOS的问题COCOS的问题
  • ¥15 FPGA-SRIO初始化失败
  • ¥15 MapReduce实现倒排索引失败
  • ¥15 ZABBIX6.0L连接数据库报错,如何解决?(操作系统-centos)