dongsi0625 2011-01-11 22:32
浏览 64
已采纳

维基百科API只返回一小组数据?

Hey there, I'm trying to extract data from Wikipedia articles using its API (http://en.wikipedia.org/w/api.php) from a PHP script, but I always only seem to get a fraction of the real content. For example, when trying :

$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt");
echo $page["content"];

This is what I get :

Array ( [query] => Array ( [pages] => Array ( [6678] => Array ( [pageid] => 6678 [ns] => 0 [title] => Cat [links] => Array ( [0] => Array ( [ns] => 0 [title] => 10th edition of Systema Naturae ) [1] => Array ( [ns] => 0 [title] => 3-mercapto-3-methylbutan-1-ol ) [2] => Array ( [ns] => 0 [title] => Abyssinian (cat) ) [3] => Array ( [ns] => 0 [title] => Actinidia polygama ) [4] => Array ( [ns] => 0 [title] => Adaptive radiation ) [5] => Array ( [ns] => 0 [title] => African Wildcat ) [6] => Array ( [ns] => 0 [title] => African wildcat ) [7] => Array ( [ns] => 0 [title] => Afro-Asiatic languages ) [8] => Array ( [ns] => 0 [title] => Age of Discovery ) [9] => Array ( [ns] => 0 [title] => Agouti signalling peptide ) ) ) ) ) [query-continue] => Array ( [links] => Array ( [plcontinue] => 6678|0|Albino ) ) ) 

I was requesting the full list of links on the "Cat" article, but I only seem to get the first 10 in alphabetic order. This happens no matter the format I choose and even from the API itself (see http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links). What is causing this restriction, and how can I fix it ?

  • 写回答

1条回答 默认 最新

  • doufangxian4985 2011-01-11 22:42
    关注

    If you look at the API manual, you will see that there is a pllimit option, which specifies how many links you want to be sent. You can get 500, or 5000 if you have a bot account, at one time.

    You will see at the end of the data dump you provided the following: [plcontinue] => 6678|0|Albino ). You can provide this information to the server and get back more links from the page, starting from that point. So the next query you make would be

    $page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt&plcontinue=6678|0|Albino");
    

    You will need to keep doing this until the server does not return a plcontinue value.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 根据以下文字信息,做EA模型图
  • ¥15 删除虚拟显示器驱动 删除所有 Xorg 配置文件 删除显示器缓存文件 重启系统 可是依旧无法退出虚拟显示器
  • ¥15 vscode程序一直报同样的错,如何解决?
  • ¥15 关于使用unity中遇到的问题
  • ¥15 开放世界如何写线性关卡的用例(类似原神)
  • ¥15 关于并联谐振电磁感应加热
  • ¥60 请查询全国几个煤炭大省近十年的煤炭铁路及公路的货物周转量
  • ¥15 请帮我看看我这道c语言题到底漏了哪种情况吧!
  • ¥60 关机时蓝屏并显示KMODE_EXCEPTION_NOT_HANDLED,怎么修?
  • ¥66 如何制作支付宝扫码跳转到发红包界面