douya2006 2010-11-17 04:48
浏览 23
已采纳

PHP DomElement-> nodeValue有gobbledygook

I'm parsing a third-party web page using PHP's DOMElement controls. When I use the web page with my browser and view the source, it's clean, but when I access some of the nodes through the DOMElement->nodeValue parameter the HTML tags aren't there, and there are several newlines and this character Â. According to this answer, this is the character that shows up when there's an encoding issue.

I also get that gobbly-gook using:

  • simplexml_import_dom($node)->asXML();
  • $doc->saveXML($node);

My question is how I can simply get the clean HTML code inside the DOMElement?

Here is the clean HTML code:

<b>Author:</b> AUTHOR<br>
            <b>ISBN:</b> 9780684857220 <br>
            <b>Edition/Copyright:</b> 7<br>
            <b>Publisher:</b> J+M<br>
            <b>Published Date:</b>  1989<br>

Here is what nodeValue gives:

                    Â 
                    Author:Â AUTHOR      ISBN:Â 9780684857220 Edition/Copyright:Â 7     Publisher:Â J+M       Published Date:Â 
                    1989
  • 写回答

2条回答 默认 最新

  • dsgdfg30210 2010-11-17 08:33
    关注

    Turns out it wasn't an encoding issue but rather I was using the wrong methods. This works:

    $doc = new DOMDocument();
    $doc->appendChild($doc->importNode($second_td,true)); 
    echo $doc->saveHTML();
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 python中transformers可以正常下载,但是没有办法使用pipeline
  • ¥50 分布式追踪trace异常问题
  • ¥15 人在外地出差,速帮一点点
  • ¥15 如何使用canvas在图片上进行如下的标注,以下代码不起作用,如何修改
  • ¥15 Windows 系统cmd后提示“加载用户设置时遇到错误”
  • ¥50 vue router 动态路由问题
  • ¥15 关于#.net#的问题:End Function
  • ¥15 无法import pycausal
  • ¥15 weditor无法连接模拟器Local server not started, start with?
  • ¥20 6-3 String类定义