dongxian0320 2011-12-01 21:10
浏览 175
已采纳

从XML站点地图获取所有链接,并将它们放入数组中?

I have a sitemap with many urls. Something like:

<url>
<loc>
http://site.com/
</loc>
<priority>
0.50
</priority>
<changefreq>
daily
</changefreq>
<lastmod>
2011-07-27T06:58:53+00:00
</lastmod>
</url>
<url>
<loc>
http://site.com/link

etc etc....

I need to get all the links in the sitemap, nothing else.

I've tried:

$links = file('sitemap.xml', FILE_IGNORE_NEW_LINES);

foreach($links as $link) {
    echo $link;
}

Now that echos all the links and leaves all the <loc>, <priority> etc etc out but it still includes the change frequency, lastmod etc etc....

So the output looks like this:

http://site.com/ 11 0.50 12 daily 13 2011-07-27T06:58:53+00:00 14  15  16 http://site.com/page.html 17 0.40 18 daily 19 2011-07-

and so on....

I need to just get the links and put the into an array. Any ideas?

Thank you.

EDIT:

Here is the code I'm using:

$urls = array();  
$xml='sitemap.xml';
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->loadXML("$xml"); // $DOMDocument->load('filename.xml');
$DomNodeList = $DomDocument->getElementsByTagName('from');

foreach($DomNodeList as $url) {
    $urls[] = $url->nodeValue;
}

//display it
echo "<pre>";
print_r($urls);
echo "</pre>";

Which returns the error: Warning: DOMDocument::loadXML() [domdocument.loadxml]: Start tag expected, '<' not found in Entity, line: 1

So i tried to test if it can even load the xml: I changed the xml file name to an invalid one ($xml='sit___emap.xml';)

I should of got an error saying it couldn't open the file, but instead it came up with the same error as before, with the correct filename set. So i don't think its the sitemap.

  • 写回答

6条回答 默认 最新

  • drbvm26000 2011-12-01 21:50
    关注

    I couldn't get @AndreyKnupp's example to work. Here's what works for me:

    $urls = array();  
    
    $DomDocument = new DOMDocument();
    $DomDocument->preserveWhiteSpace = false;
    $DomDocument->load('filename.xml');
    $DomNodeList = $DomDocument->getElementsByTagName('loc');
    
    foreach($DomNodeList as $url) {
        $urls[] = $url->nodeValue;
    }
    
    //display it
    echo "<pre>";
    print_r($urls);
    echo "</pre>";
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(5条)

报告相同问题?

悬赏问题

  • ¥15 2024-五一综合模拟赛
  • ¥15 如何将下列的“无限压缩存储器”设计出来
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口