doushouxie7064
2015-02-08 21:37
浏览 45

PHP如何使用“; 带有DOMdocument的XML实体

I am working on modifying the contents of an XML file generated by some other library. I'm making some DOM modifications with PHP (5.3.10) and reinserting a replacement node.

The XML data I'm working with has " elements before I do the manipulation and I want to keep those elements as per http://www.w3.org/TR/REC-xml/ when I'm done with the modifications.

However I'm having problems with PHP changing the " elements. See my example.

$temp = 'Hello "XML".';
$doc = new DOMDocument('1.0', 'utf-8');
$newelement = $doc->createElement('description', $temp);
$doc->appendChild($newelement);
echo $doc->saveXML() . PHP_EOL; // shows " instead of element
$node = $doc->getElementsByTagName('description')->item(0);
echo $node->nodeValue . PHP_EOL; // also shows "

Output

<?xml version="1.0" encoding="utf-8"?> 
<description>Hello "XML".</description>

Hello "XML".

Is this a PHP error or am I doing something wrong? I hope it isn't necessary to use createEntityReference in every char location.

Similar Question: PHP XML Entity Encoding issue


EDIT: As an example to show saveXML should not be converting the &quot; entities just like the &amp; which behaves properly. This $temp string should really be output as it is initially entered with the entities during saveXML().

$temp = 'Hello &quot;XML&quot; &amp;.';
$doc = new DOMDocument('1.0', 'utf-8');
$newelement = $doc->createElement('description', $temp);
$doc->appendChild($newelement);
echo $doc->saveXML() . PHP_EOL; // shows " instead of element like &amp;
$node = $doc->getElementsByTagName('description')->item(0);
echo $node->nodeValue . PHP_EOL; // also shows " &

Output

<?xml version="1.0" encoding="utf-8"?>
<description>Hello "XML" &amp;.</description>

Hello "XML" &.
  • 写回答
  • 好问题 提建议
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • douke1942 2015-02-09 04:39
    已采纳

    The answer is that it doesn't actually need any escaping according to the spec (skipping the mentions of CDATA):

    The ampersand character (&) and the left angle bracket (<) must not appear in their literal form (...) If they are needed elsewhere, they must be escaped using either numeric character references or the strings " &amp; " and " &lt; " respectively. The right angle bracket (>) may be represented using the string " &gt; " (...)

    To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " &apos; ", and the double-quote character (") as " &quot; ".

    You can verify this easily by using createTextNode() to perform the correct escaping:

    $dom = new DOMDocument;
    $e = $dom->createElement('description');
    $content = 'single quote: \', double quote: ", opening tag: <, ampersand: &, closing tag: >';
    $t = $dom->createTextNode($content);
    $e->appendChild($t);
    $dom->appendChild($e);
    
    echo $dom->saveXML();
    

    Output:

    <?xml version="1.0"?>
    <description>single quote: ', double quote: ", opening tag: &lt;, ampersand: &amp;, closing tag: &gt;</description>
    
    已采纳该答案
    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题