donte1234567 2019-02-02 21:13
浏览 619

根据CURL size_download / download_content_length验证保存的HTML文件大小?

It always seems to be slightly off.

Whilst downloading an HTML file using CURL, I'm attempting to verify that the saved HTML file is the same size as the headers indicate.

Minified:

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $URL);
    $result = curl_exec($ch);
    $curlinfo = curl_getinfo($ch);

Among other things, $curlinfo provides the following information:

[size_download] => 331650
[download_content_length] => 331650

(these are always the same value in my experience)

I've tried using mb_strlen

mb_strlen($result, 'utf8'); = 331495

mb_strlen($result); = 331495

Slightly off.

Using DOM to save the file

    $DOM = new DOMDocument();
    $DOM->preserveWhiteSpace = FALSE;
    libxml_use_internal_errors(true);
    $DOM->LoadHTML($result);
    $DOM->encoding = 'utf-8';
    $SaveHTMLfile = $DOM->saveHTMLFile($filename);

Checking this with filesize($fileName);

Slightly more... FileSize: 332295

Of course if I modify the encoding, or modify the preserveWhiteSpace setting the filesize($filename) value skews one way or the other.. never to the result filesize indicated in the curl headers (331650).

Is there a way or a method I am missing that will allow me to verify the HTML file downloaded from the external source down to the actual byte?

  • 写回答

0条回答

    报告相同问题?

    悬赏问题

    • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
    • ¥50 有数据,怎么用matlab求全要素生产率
    • ¥15 TI的insta-spin例程
    • ¥15 完成下列问题完成下列问题
    • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
    • ¥15 YoloV5 第三方库的版本对照问题
    • ¥15 请完成下列相关问题!
    • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
    • ¥15 求daily translation(DT)偏差订正方法的代码
    • ¥15 js调用html页面需要隐藏某个按钮