When parsing a html document, using DOMDocument, I get different encoding from the nodeValue. In my dev environment I get UTF-8, however when uploading the script to webserver I get ISO-8859-1.
Can any one explain this behaviour and how to get same encoding?
<?php
header('Content-Type:text/html; charset=UTF-8');
$strHtml = file_get_contents("http://www.aftonbladet.se/senastenytt/ttnyheter/inrikes/article13397806.ab");
$objDOM= new DOMDocument();
@$objDOM->loadHTML($strHtml);
echo "Encoding: ". $objDOM->encoding."<br/>";
//Parse heading from DOMDocument
$objNodelist = $objDOM->getElementsByTagname('h1');
foreach ($objNodelist as $objElem)
{
$strNodeValue = $objElem->nodeValue; //get the
break;
}
echo 'nodeValue: "'.$strNodeValue.'"<br/>';
echo 'utf8_decode: "'.utf8_decode($strNodeValue).'"<br/>';
echo 'utf8_encode: "'.utf8_encode($strNodeValue).'"<br/>';
//Parse heading using substring from html
$strHeading = substr($strHtml , strpos($strHtml, '<h1 class="abS32">')+18, strpos($strHtml, '</h1>') - strpos($strHtml, '<h1 class="abS32">')-18);
echo 'Heading from substring: "'.$strHeading.'"';
?>
Output when run in development environment
Encoding: utf-8
nodeValue: "När semestern inleds vankas åska"
utf8_decode: "N�r semestern inleds vankas �ska"
utf8_encode: "När semestern inleds vankas åska"
Heading from substring: "När semestern inleds vankas åska"
Output when run on public web server
Encoding: utf-8
nodeValue: "När semestern inleds vankas åska"
utf8_decode: "När semestern inleds vankas åska"
utf8_encode: "När semestern inleds vankas ÃÂ¥ska"
Heading from substring: "När semestern inleds vankas åska"
Apparently utf8_decode needs to be used on the public web server, but not in my dev environment. I would like to have the same behaviour on both systems. Any ideas?