dongqiao9583 2015-10-27 11:14
浏览 27
已采纳

用Path刮一张桌子

I have a table in a html page that looks like this (pastebin url).

The current code I'm trying to grab the content from the table is:

$html = htmlspecialchars("https://localhost/table.php");

$doc = new \DOMDocument();

if($doc->loadHTML($html))
{
    $result = new \DOMDocument();
    $result->formatOutput = true;
    $table = $result->appendChild($result->createElement("table"));
    $thead = $table->appendChild($result->createElement("thead"));
    $tbody = $table->appendChild($result->createElement("tbody"));

    $xpath = new \DOMXPath($doc);

    $newRow = $thead->appendChild($result->createElement("tr"));

    foreach($xpath->query("//table[@id='kurstabell']/thead/tr/th[position()>1]") as $header)
    {
        $newRow->appendChild($result->createElement("th", trim($header->nodeValue)));
    }

    foreach($xpath->query("//table[@id='kurstabell']/tbody/tr") as $row)
    {
        $newRow = $tbody->appendChild($result->createElement("tr"));

        foreach($xpath->query("./td[position()>1]", $row) as $cell)
        {
            $newRow->appendChild($result->createElement("td", trim($cell->nodeValue)));
        }
    }

    echo $result->saveXML($result->documentElement);
}

print_r($result);

(Im using htmlspecialchars because libxml_use_internal_errors(true); generates error code Europe/Berlin] PHP Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: So i read somewhere that htmlspecialchars ok to use)

The current result of this snipp looks like this:

DOMDocument Object ( [doctype] => [implementation] => (object value omitted) [documentElement] => (object value omitted) [actualEncoding] => [encoding] => [xmlEncoding] => [standalone] => 1 [xmlStandalone] => 1 [version] => 1.0 [xmlVersion] => 1.0 [strictErrorChecking] => 1 [documentURI] => [config] => [formatOutput] => 1 [validateOnParse] => [resolveExternals] => [preserveWhiteSpace] => 1 [recover] => [substituteEntities] => [nodeName] => #document [nodeValue] => [nodeType] => 9 [parentNode] => [childNodes] => (object value omitted) [firstChild] => (object value omitted) [lastChild] => (object value omitted) [previousSibling] => [attributes] => [ownerDocument] => [namespaceURI] => [prefix] => [localName] => [baseURI] => [textContent] => )

php_error.log doesn't give me any errors.

The expected result is the same table, echoed in html, but with all "unnecessary" code removed.

My question: What is wrong with the current piece of code?

  • 写回答

1条回答 默认 最新

  • doufa5001 2015-10-28 16:57
    关注

    The problem is with the first line:

    $html = htmlspecialchars("https://localhost/table.php");
    

    It should simply be:

    $html = file_get_contents("https://localhost/table.php");
    

    The function htmlspecialchars() escapes all HTML tags which, when parsed by loadHTML() returns a single text node rather than the expected DOM.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 帮我写一个c++工程
  • ¥30 Eclipse官网打不开,官网首页进不去,显示无法访问此页面,求解决方法
  • ¥15 关于smbclient 库的使用
  • ¥15 微信小程序协议怎么写
  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教