douya2006 2010-11-17 04:48
浏览 23
已采纳

PHP DomElement-> nodeValue有gobbledygook

I'm parsing a third-party web page using PHP's DOMElement controls. When I use the web page with my browser and view the source, it's clean, but when I access some of the nodes through the DOMElement->nodeValue parameter the HTML tags aren't there, and there are several newlines and this character Â. According to this answer, this is the character that shows up when there's an encoding issue.

I also get that gobbly-gook using:

  • simplexml_import_dom($node)->asXML();
  • $doc->saveXML($node);

My question is how I can simply get the clean HTML code inside the DOMElement?

Here is the clean HTML code:

<b>Author:</b> AUTHOR<br>
            <b>ISBN:</b> 9780684857220 <br>
            <b>Edition/Copyright:</b> 7<br>
            <b>Publisher:</b> J+M<br>
            <b>Published Date:</b>  1989<br>

Here is what nodeValue gives:

                    Â 
                    Author:Â AUTHOR      ISBN:Â 9780684857220 Edition/Copyright:Â 7     Publisher:Â J+M       Published Date:Â 
                    1989
  • 写回答

2条回答 默认 最新

  • dsgdfg30210 2010-11-17 08:33
    关注

    Turns out it wasn't an encoding issue but rather I was using the wrong methods. This works:

    $doc = new DOMDocument();
    $doc->appendChild($doc->importNode($second_td,true)); 
    echo $doc->saveHTML();
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 sub地址DHCP问题
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办
  • ¥15 kylin启动报错log4j类冲突
  • ¥15 超声波模块测距控制点灯,灯的闪烁很不稳定,经过调试发现测的距离偏大