I need to parse an xml document that I receive from a third party using php. I am not able to ask the maintainers of the document to fix its structure. When I parse the document using simplexml_load_file
the XML documen is empty.
Here is a stripped down example of what I am seeing.
my-file.xml:
<?xml version="1.0" encoding="utf-8"?>
<DataSet>
<diffgr:diffgram xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
aaa
</diffgr:diffgram>
</DataSet>
And I process it like this (from the command line):
php > $xml = simplexml_load_file('my-file.xml');
php > print_r($xml);
SimpleXMLElement Object
(
)
I was expecting that the xml structure is displayed through print_r
.
Indeed, when I remove the namespace declaration, things seem to work (despite some expected XML parse warnings):
my-file-nonamespace.xml:
<?xml version="1.0" encoding="utf-8"?>
<DataSet>
<diffgr:diffgram>
aaa
</diffgr:diffgram>
</DataSet>
Processing it the same way on the command line (with warnings removed):
php > $xml = simplexml_load_file('my-file-nonamespace.xml');
// a bunch of xml parse warnings
php > print_r($xml);
SimpleXMLElement Object
(
[diffgr:diffgram] =>
aaa
)
So, the problem has to do with an invalid namespace declaration. I can probably use a regular expression on the file to remove the namespace declaration before parsing, but that is not a direction I want to go.
What is the best way to properly parse the first document in PHP?