donte1234567 2019-02-02 21:13
浏览 619

根据CURL size_download / download_content_length验证保存的HTML文件大小?

It always seems to be slightly off.

Whilst downloading an HTML file using CURL, I'm attempting to verify that the saved HTML file is the same size as the headers indicate.

Minified:

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $URL);
    $result = curl_exec($ch);
    $curlinfo = curl_getinfo($ch);

Among other things, $curlinfo provides the following information:

[size_download] => 331650
[download_content_length] => 331650

(these are always the same value in my experience)

I've tried using mb_strlen

mb_strlen($result, 'utf8'); = 331495

mb_strlen($result); = 331495

Slightly off.

Using DOM to save the file

    $DOM = new DOMDocument();
    $DOM->preserveWhiteSpace = FALSE;
    libxml_use_internal_errors(true);
    $DOM->LoadHTML($result);
    $DOM->encoding = 'utf-8';
    $SaveHTMLfile = $DOM->saveHTMLFile($filename);

Checking this with filesize($fileName);

Slightly more... FileSize: 332295

Of course if I modify the encoding, or modify the preserveWhiteSpace setting the filesize($filename) value skews one way or the other.. never to the result filesize indicated in the curl headers (331650).

Is there a way or a method I am missing that will allow me to verify the HTML file downloaded from the external source down to the actual byte?

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 完成下列问题完成下列问题
    • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
    • ¥15 YoloV5 第三方库的版本对照问题
    • ¥15 请完成下列相关问题!
    • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
    • ¥15 求daily translation(DT)偏差订正方法的代码
    • ¥15 js调用html页面需要隐藏某个按钮
    • ¥15 ads仿真结果在圆图上是怎么读数的
    • ¥20 Cotex M3的调试和程序执行方式是什么样的?
    • ¥20 java项目连接sqlserver时报ssl相关错误