根据CURL size_download / download_content_length验证保存的HTML文件大小?

It always seems to be slightly off.

Whilst downloading an HTML file using CURL, I'm attempting to verify that the saved HTML file is the same size as the headers indicate.

Minified:

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $URL);
    $result = curl_exec($ch);
    $curlinfo = curl_getinfo($ch);

Among other things, $curlinfo provides the following information:

[size_download] => 331650
[download_content_length] => 331650

(these are always the same value in my experience)

I've tried using mb_strlen

mb_strlen($result, 'utf8'); = 331495

mb_strlen($result); = 331495

Slightly off.

Using DOM to save the file

    $DOM = new DOMDocument();
    $DOM->preserveWhiteSpace = FALSE;
    libxml_use_internal_errors(true);
    $DOM->LoadHTML($result);
    $DOM->encoding = 'utf-8';
    $SaveHTMLfile = $DOM->saveHTMLFile($filename);

Checking this with filesize($fileName);

Slightly more... FileSize: 332295

Of course if I modify the encoding, or modify the preserveWhiteSpace setting the filesize($filename) value skews one way or the other.. never to the result filesize indicated in the curl headers (331650).

Is there a way or a method I am missing that will allow me to verify the HTML file downloaded from the external source down to the actual byte?

dongzengzai4567
dongzengzai4567 很好,很高兴strlen解决方案解决了它!您最终可能会找到一些无法准确使用生成的内容大小的方案。例如,如果传输编码是gzip或deflate,则内容长度标题将是压缩内容的大小,并且下载的字节将小于未压缩的内容。在这种情况下,只要相信curl不会在没有错误的情况下返回结果,并且不会返回不完整的内容。快乐的编码。
一年多之前 回复
dongxuan2577
dongxuan2577 没有Content-Encoding或Transfer-Encoding从我能看到的......虽然好主意,但没有想到这是一个原因。你是对的,strlen提供了正确的字节大小!所以...$download_size=(int)$getPage['CurlInfo']['download_content_length'];$downloaded_size=strlen($HTML);和一个比较运算符来验证if($download_size===$downloaded_size){现在似乎是一个很好的工作解决方案。谢谢!
一年多之前 回复
dqsa17330
dqsa17330 size_download和download_content_length将以字节为单位,因此请使用strlen而不是mb_strlen。此外,是否有任何Content-Encoding或Transfer-Encoding标头发送?使用像修改内容的DOM之类的东西可能会进一步降低你的结果。
一年多之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐