duanjiao6730 2017-02-21 18:16
浏览 49
已采纳

PHP XML DOM:为什么我的大型HTML文件被截断?

I am trying to process a large HTML file using DOM. I read it in and immediately write it out to another file without making any changes, but the output file is much smaller (and shorter) than the input.

This is particularly puzzling, because I could swear I did this previously while learning to use DOM and the output looked okay.

Here is my code:

<?
    // ini_set("memory_limit", -1);
    require_once("inc/common.inc");

    $acad = "../inprogress/academy/";
    $htmFName = "$acad/mf/humanacad.htm";
    $sz = filesize($htmFName);
    echo "fname: $htmFName, $sz bytes
";

    $dom = new DOMDocument();
    $dom->loadHTML($htmFName);
    $dom->save("z");
    $sz = filesize("z");
    echo "fname: z: $sz bytes
";

And the output:

fname: ../inprogress/academy//mf/humanacad.htm, 2621622 bytes
fname: z: 219 bytes

Here is the beginning of the input file:

<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=utf-8">
<meta name=Generator content="Microsoft Word 11 (filtered)">
<title> The Hanging Academy</title>
<style>
<!--
...
 -->
</style>
</head>
<body lang=EN-US link=blue vlink=blue>
<div class=Section1>
<p class=SectionHd>THE HANGING ACADEMY -- Part 1: Miranda</p>

And here is the entirety of the output file:

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>../inprogress/academy//mf/humanacad.htm</p></body></html>
  • 写回答

1条回答 默认 最新

  • doutandusegang2961 2017-02-21 18:22
    关注

    I think it is because you were meaning to use loadHTMLFile( $filename ) not loadHTML( $html ). loadHTML( $html ) expects the string passed to be HTML content. Not a filename of where to retrieve content.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 软件供应链安全是跟可靠性有关还是跟安全性有关?
  • ¥15 电脑蓝屏logfilessrtsrttrail问题
  • ¥20 关于wordpress建站遇到的问题!(语言-php)(相关搜索:云服务器)
  • ¥15 【求职】怎么找到一个周围人素质都很高不会欺负他人,并且未来月薪能够达到一万以上(技术岗)的工作?希望可以收到写有具体,可靠,已经实践过了的路径的回答?
  • ¥15 Java+vue部署版本反编译
  • ¥100 对反编译和ai熟悉的开发者。
  • ¥15 带序列特征的多输出预测模型
  • ¥15 Python 如何安装 distutils模块
  • ¥15 关于#网络#的问题:网络是从楼上引一根网线下来,接了2台傻瓜交换机,也更换了ip还是不行
  • ¥15 资源泄露软件闪退怎么解决?