You can use XPath. It may be simpler than SimpleXML when you have namespaces. You will also have to register the namespace which is not present in the feed excerpt you included as an example.
I found an arbitrary feed here: http://www.google.com/alerts/feeds/01662123773360489091/16526224428036307178
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:idx="urn:atom-extension:indexing">
<id>
tag:google.com,2005:reader/user/01662123773360489091/state/com.google/alerts/16526224428036307178
</id>
<title>Google Alert - test</title>
<link href="http://www.google.com/alerts/feeds/01662123773360489091/16526224428036307178" rel="self"/>
<updated>2014-06-15T17:30:04Z</updated>
<entry>
<id>
tag:google.com,2013:googlealerts/feed:5957360885559055905
</id>
<title type="html">
Dad's <b>Test</b> Out Products Made For the Family
</title>
<link href="https://www.google.com/url?q=http://gma.yahoo.com/video/dads-test-products-made-family-141428658.html&ct=ga&cd=CAIyAA&usg=AFQjCNHHBPoS6Poz-Y5A3vFfbsGL3fkrBA"/>
<published>2014-06-15T17:30:04Z</published>
<updated>2014-06-15T17:30:04Z</updated>
<content type="html">
Watch the video Dad's <b>Test</b> Out Products Made For the Family on Yahoo Good Morning America . Becky Worley enlists a group of fathers to see if "As ...
</content>
<author>
<name/>
</author>
</entry>
<entry>
...
I will use it to provide your answer.
In the first line there is a default namespace declaration xmlns
. You have to register that in PHP to use the namespace in XPath. You should map it to a prefix (could be any one) even if there is no prefix in the original file. So this is how you would initialize the parser.
These two lines initialize the DOM parser and parse the file, loading it from the Internet:
$document = new DOMDocument();
$document->load( "http://www.google.com/alerts/feeds/01662123773360489091/16526224428036307178" );
These two initialize the XPath environment, registering the default namespace of your file with a prefix (I chose atom
):
$xpath = new DOMXpath($document);
$xpath->registerNamespace("atom", "http://www.w3.org/2005/Atom");
Once that is set up, you can select the nodes using the evaluate()
expression, which can be absolute or relative. To get all entry nodes, you can use an absolute expression:
$entries = $xpath->evaluate("//atom:entry");
The XPath expression is //atom::entry
. It returns a set of entry
nodes from the "http://www.w3.org/2005/Atom"
namespace, which is what you want.
To extract the nodes and the information in the context of each entry
, you can use DOM methods and properties such as firstChild
, nextSibling
, etc. or you can perform additional XPath contextual searches. A contextual search passes the context node as a second parameter to the evaluate()
expression. Here is a loop that gets the data in each child node of <entry>
and places it in an HTML sublist:
$entries = $xpath->evaluate("//atom:entry");
echo '<ul>'."
";
foreach ($entries as $entry) {
echo '<li><b>Entry ID: '.$xpath->evaluate("atom:id/text()", $entry)->item(0)->nodeValue.'</b></li>'."
";
echo '<ul>'."
";
echo '<li>Title: '.$xpath->evaluate("atom:title/text()", $entry)->item(0)->nodeValue.'</li>'."
";
echo '<li>Link: '.$xpath->evaluate("atom:link/@href", $entry)->item(0)->nodeValue.'</li>'."
";
echo '<li>Published: '.$xpath->evaluate("atom:published/text()", $entry)->item(0)->nodeValue.'</li>'."
";
echo '<li>Updated: '.$xpath->evaluate("atom:updated/text()", $entry)->item(0)->nodeValue.'</li>'."
";
echo '<li>Content: '.$xpath->evaluate("atom:content/text()", $entry)->item(0)->nodeValue.'</li>'."
";
echo '<li>Author: '.$xpath->evaluate("atom:author/atom:name/text()", $entry)->item(0)->nodeValue.'</li>'."
";
echo '</ul>'."
";
}
echo '</ul>'."
";
Note that the expressions are relative to entry
(they don't start with /
), he element selectors are also prefixed (they also belong to the atom namespace), and I used item(0)
and nodeValue
to extract the results. Since nodes may have many children, the evaluate()
expression as used above returns a nodeset. If there is only one text child, it's in item(0)
. nodeValue
converts it to string.
The result of running the program above will be:
<ul>
<li><b>Entry ID: tag:google.com,2013:googlealerts/feed:5957360885559055905</b></li>
<ul>
<li>Title: Dad's <b>Test</b> Out Products Made For the Family</li>
<li>Link: https://www.google.com/url?q=http://gma.yahoo.com/video/dads-test-products-made-family-141428658.html&ct=ga&cd=CAIyAA&usg=AFQjCNHHBPoS6Poz-Y5A3vFfbsGL3fkrBA</li>
<li>Published: 2014-06-15T17:30:04Z</li>
<li>Updated: 2014-06-15T17:30:04Z</li>
<li>Content: Watch the video Dad's <b>Test</b> Out Products Made For the Family on Yahoo Good Morning America . Becky Worley enlists a group of fathers to see if "As ...</li>
<li>Author: </li>
</ul>
<li><b>Entry ID: tag:google.com,2013:googlealerts/feed:11008408359408830921</b></li>
<ul>
<li>Title: Germany faces major <b>test</b> of strength in its World Cup opener against Portugal</li>
<li>Link: https://www.google.com/url?q=http://www.foxnews.com/sports/2014/06/15/germany-faces-major-test-strength-in-its-world-cup-opener-against-portugal/&ct=ga&cd=CAIyAA&usg=AFQjCNHOU94QyciRpCEdJawOwl3diEEO0A</li>
<li>Published: 2014-06-15T16:18:45Z</li>
<li>Updated: 2014-06-15T16:18:45Z</li>
<li>Content: Cristiano Ronaldo stretches during a training session of Portugal in Campinas, Brazil, Saturday, June 14, 2014. Portugal plays in group G of the Brazil ...</li>
<li>Author: </li>
</ul>
<li><b>Entry ID: tag:google.com,2013:googlealerts/feed:8664961950651004785</b></li>
...
Now you can edit the code to adapt it to the data you wish to extract.
You can see a working example of this application in this PHP Fiddle