duanmi1900 2010-06-17 02:13
浏览 59
已采纳

从外部页面获取XML数据并使用PHP解析它

I'm trying to create a database of World of Warcraft gems. If I go to this page:

http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=purple&searchType=items

And go to View Source in Firefox, I see a tonne of XML data which is exactly what I want. I wrote up this quick script to try and parse some of it:

<?php

$gemUrls = array(
                 'Blue' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=blue&searchType=items',
                 'Red' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=red&searchType=items',
                 'Yellow' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=yellow&searchType=items',
                 'Meta' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=meta&searchType=items',
                 'Green' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=green&searchType=items',
                 'Orange' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=orange&searchType=items',
                 'Purple' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=purple&searchType=items',
                 'Prismatic' => 'http://www.wowarmory.com/search.xml?fl[source]=all&fl[type]=gems&fl[subTp]=purple&searchType=items'
                 );


// Get blue gems

$blueGems = file_get_contents($gemUrls['Blue']);

$xml = new SimpleXMLElement($blueGems);

echo $xml->items[0]->item;

?>

But I get a load of errors like this:

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 20: parser error : xmlParseEntityRef: no name in C:\xampp\htdocs\WoW\index.php on line 19

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: if(Browser.iphone && Number(getcookie2("mobIntPageVisits")) < 3 && getcookie2( in C:\xampp\htdocs\WoW\index.php on line 19

I'm not sure what's wrong. I think file_get_contents() is bringing back data that isn't XML, maybe some Javascript files judging by the iPhone parts in the errors.

Is there any way to just get back the XML from that page? Without any HTML or anything?

Thanks :)

  • 写回答

1条回答 默认 最新

  • dsfdsfds521521 2010-06-17 02:39
    关注

    What is returned is an xhtml, it's xml-ish, but not good enough for an XML parser. To use SimpleXMLElement you would need well-formed XML. From the documentation of the constructor:

    Method signature:

    __construct ( string $data [, int $options [, bool $data_is_url 
                 [, string $ns [, bool $is_prefix ]]]] )
    

    $data is described as:

    A well-formed XML string or the path or URL to an XML document if data_is_url is TRUE.

    So, random webpage will not satisfy this parser. You ask:

    "Is there any way to just get back the XML from that page? Without any HTML or anything?"

    You can contact the webmasters and find out if they have an XML view of the data. Failing that, you could use a plain HTML parser to try and extract data. I like PHP Simple HTML DOM Parser. Check out How to implement a web scraper in PHP?

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 求螺旋焊缝的图像处理
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥15 网络通信安全解决方案
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面
  • ¥15 itunes恢复数据最后一步发生错误
  • ¥15 关于#windows#的问题:2024年5月15日的win11更新后资源管理器没有地址栏了顶部的地址栏和文件搜索都消失了
  • ¥100 H5网页如何调用微信扫一扫功能?
  • ¥15 讲解电路图,付费求解