dsvd407787736 2011-07-30 21:09
浏览 48

PHP DOMDocument nodeValue返回不同​​的编码

When parsing a html document, using DOMDocument, I get different encoding from the nodeValue. In my dev environment I get UTF-8, however when uploading the script to webserver I get ISO-8859-1.

Can any one explain this behaviour and how to get same encoding?

<?php
header('Content-Type:text/html; charset=UTF-8');
$strHtml = file_get_contents("http://www.aftonbladet.se/senastenytt/ttnyheter/inrikes/article13397806.ab");

$objDOM= new DOMDocument();
@$objDOM->loadHTML($strHtml);
echo "Encoding: ". $objDOM->encoding."<br/>";

//Parse heading from DOMDocument
$objNodelist = $objDOM->getElementsByTagname('h1');
foreach ($objNodelist as $objElem)
{
    $strNodeValue = $objElem->nodeValue; //get the 
    break;
}
echo 'nodeValue: "'.$strNodeValue.'"<br/>';
echo 'utf8_decode: "'.utf8_decode($strNodeValue).'"<br/>';
echo 'utf8_encode: "'.utf8_encode($strNodeValue).'"<br/>';

//Parse heading using substring from html
$strHeading = substr($strHtml , strpos($strHtml, '<h1 class="abS32">')+18, strpos($strHtml, '</h1>') - strpos($strHtml, '<h1 class="abS32">')-18);
echo 'Heading from substring: "'.$strHeading.'"';
?>

Output when run in development environment
Encoding: utf-8
nodeValue: "När semestern inleds vankas åska"
utf8_decode: "N�r semestern inleds vankas �ska"
utf8_encode: "När semestern inleds vankas åska"
Heading from substring: "När semestern inleds vankas åska"

Output when run on public web server
Encoding: utf-8
nodeValue: "När semestern inleds vankas åska"
utf8_decode: "När semestern inleds vankas åska"
utf8_encode: "När semestern inleds vankas ÃÂ¥ska"
Heading from substring: "När semestern inleds vankas åska"

Apparently utf8_decode needs to be used on the public web server, but not in my dev environment. I would like to have the same behaviour on both systems. Any ideas?

  • 写回答

2条回答 默认 最新

  • dougu2036 2011-07-30 21:25
    关注

    I can think of two possible reasons for this behaviour.

    First - Take a look at the default_charset in the two php.ini files. I think you will find that one sets it to "iso-8859-1" (the default) and the other to "utf8".

    Second, check the code used to connect from php to your database, and the database connection defauilts. These might also be different.

    You can use the following code to switch a Mysql connection to utf-8.

    if (phpversion() > "5.0.7") {
            $result = mysql_set_charset('utf8');
        } else {
            $result = mysql_query("SET NAMES 'utf8' COLLATE 'utf8_unicode_ci';");
        }
    
    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题