It always seems to be slightly off.
Whilst downloading an HTML file using CURL, I'm attempting to verify that the saved HTML file is the same size as the headers indicate.
Minified:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $URL);
$result = curl_exec($ch);
$curlinfo = curl_getinfo($ch);
Among other things, $curlinfo
provides the following information:
[size_download] => 331650
[download_content_length] => 331650
(these are always the same value in my experience)
I've tried using mb_strlen
mb_strlen($result, 'utf8');
= 331495
mb_strlen($result);
= 331495
Slightly off.
Using DOM to save the file
$DOM = new DOMDocument();
$DOM->preserveWhiteSpace = FALSE;
libxml_use_internal_errors(true);
$DOM->LoadHTML($result);
$DOM->encoding = 'utf-8';
$SaveHTMLfile = $DOM->saveHTMLFile($filename);
Checking this with filesize($fileName);
Slightly more... FileSize: 332295
Of course if I modify the encoding, or modify the preserveWhiteSpace
setting the filesize($filename)
value skews one way or the other.. never to the result filesize indicated in the curl headers (331650
).
Is there a way or a method I am missing that will allow me to verify the HTML file downloaded from the external source down to the actual byte?