维基百科API只返回一小组数据？

Hey there, I'm trying to extract data from Wikipedia articles using its API (http://en.wikipedia.org/w/api.php) from a PHP script, but I always only seem to get a fraction of the real content. For example, when trying :

$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt");
echo $page["content"];

This is what I get :

Array ( [query] => Array ( [pages] => Array ( [6678] => Array ( [pageid] => 6678 [ns] => 0 [title] => Cat [links] => Array ( [0] => Array ( [ns] => 0 [title] => 10th edition of Systema Naturae ) [1] => Array ( [ns] => 0 [title] => 3-mercapto-3-methylbutan-1-ol ) [2] => Array ( [ns] => 0 [title] => Abyssinian (cat) ) [3] => Array ( [ns] => 0 [title] => Actinidia polygama ) [4] => Array ( [ns] => 0 [title] => Adaptive radiation ) [5] => Array ( [ns] => 0 [title] => African Wildcat ) [6] => Array ( [ns] => 0 [title] => African wildcat ) [7] => Array ( [ns] => 0 [title] => Afro-Asiatic languages ) [8] => Array ( [ns] => 0 [title] => Age of Discovery ) [9] => Array ( [ns] => 0 [title] => Agouti signalling peptide ) ) ) ) ) [query-continue] => Array ( [links] => Array ( [plcontinue] => 6678|0|Albino ) ) )

I was requesting the full list of links on the "Cat" article, but I only seem to get the first 10 in alphabetic order. This happens no matter the format I choose and even from the API itself (see http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links). What is causing this restriction, and how can I fix it ?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doufangxian4985 2011-01-11 22:42
关注
If you look at the API manual, you will see that there is a pllimit option, which specifies how many links you want to be sent. You can get 500, or 5000 if you have a bot account, at one time.

You will see at the end of the data dump you provided the following: [plcontinue] => 6678|0|Albino ). You can provide this information to the server and get back more links from the page, starting from that point. So the next query you make would be

$page=get_web_page("http://en.wikipedia.org/w/api.php?action=query&titles=Cat&prop=links&format=txt&plcontinue=6678|0|Albino");

You will need to keep doing this until the server does not return a plcontinue value.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

维基百科API全文搜索：仅返回完全匹配 php
2016-06-07 04:26

回答 1 已采纳 There's not much to add to what leo said: Wikipedia now uses the CirrusSearch extension, so when u
如何使用维基百科的Web API检索人的传记信息？ json php
2015-05-07 03:19

回答 1 已采纳 Simple: you must not extract biographical data from Wikipedia directly, but from its structured da
维基百科Api Extensions with Php php
2014-06-13 16:28

回答 1 已采纳 There is no certain way to chose only articles about people. You either try to traverse your way t
有哪些网站用爬虫爬取能得到很有价值的数据？
2019-05-05 17:47

BC_COM的博客 1、微信好友的爬虫，了解一下你的好友全国分布，男女比例，听起来似乎是一个不错的想法，当然你还可以识别一下你的好友有多少人是用自己照片作为头像的，详细的内容可以点击这里：Python对微信好友进行简单统计分析 ...
如何获得复杂的维基百科模板的结果？ php
2012-01-19 21:56

回答 3 已采纳 Use action=parse instead of action=expandtemplates. As you've noticed, expandtemplates only expand
有没有前辈搞过维基百科的离线版本啊搜索引擎百度
2020-03-13 20:58

回答 1 已采纳用MediaWiki自己搭建一个。萌娘百科就是一个例子 https://zh.moegirl.org/Mainpage
PHP simple_html_dom无法正确解析Apple维基百科页面 html php
2015-03-22 17:28

回答 1 已采纳 Change MAX_FILE_SIZE constant in simple_html_dom.php to, e.g. define('MAX_FILE_SIZE', 800000);
C语言介绍维基百科
2012-09-24 19:12

iteye_4195的博客 维基百科，自由的百科全书跳转到：导航、搜索跳过字词转换说明汉漢▼▲ 为了阅读方便，本文使用全文手工转换。转换内容：下面采用电脑和信息技术组全文转换 [编辑] 以下为本条目单独...
如何从PHP获取Wikipedia API的结果？ php
2012-01-21 20:19

回答 4 已采纳 The problem you are running into here is related to the MW API's User-Agent policy - you must supp
在我的网站上加载维基百科页面[重复] facebook php
2014-08-14 11:42

回答 2 已采纳 This looks like an extensive and potentially helpful walkthrough for what you're looking to do: ib
维基百科样式包括 - 循环检测PHP php
2011-12-15 16:44

回答 2 已采纳 Actually if you encounter a cycle, you can't resolve any longer. Example: 1: {{2}} 2: {{1}} Thi
论文翻译：ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?
2024-08-04 11:38

CSPhD-winston-杨帆的博客 ChatGPT 是一种最先进的语言模型（OpenAI的生成预训练变换器（GPT）语言模型的一个变体），旨在生成可以与人类写作的文本无法区分的文本。它可以以一种看似自然直观的方式与用户进行对话。在本文中，我们简要讲述了...
LLMs：《PaLM: Scaling Language Modeling with Pathways》翻译与解读
2022-06-27 00:29

一个处女座的程序猿的博客 LLMs：《PaLM: Scaling Language Modeling with Pathways》翻译与解读目录《PaLM: Scaling Language Modeling with ...3、Training Dataset训练数据集 4、Training Infrastructure训练基础设施 5、Trai
AI最全数据集汇总：语音、歌声、音乐、图片、视频等领域开源数据集链接汇总
2020-04-12 23:00

AI拉呱的博客文章目录**音乐数据集**百万歌数据集**语音数据集**口语维基百科语料库语音命令数据集零资源语音挑战ISOLET数据集阿拉伯语言语料库TIMIT语料库**音响/自然**环境音频数据集城市声音分类城市声音数据集鸟类音频检测...
软件工程资料 - 优秀的大学怎么教程序开发和软件工程课
2017-09-02 06:17

SoftwareTeacher的博客上课基本没有讲义，直接维基百科，无比飘逸，quiz和midterm都很简单，甚至允许你自己出题，如果题目出的好还给你加分，但真正学东西的是当你做project的时候，和所有队友沟通，交流，分配任务。我的队友Brian，他也...
没有解决我的问题, 去提问

悬赏问题

¥15 根据以下文字信息，做EA模型图
¥15 删除虚拟显示器驱动删除所有 Xorg 配置文件删除显示器缓存文件重启系统可是依旧无法退出虚拟显示器
¥15 vscode程序一直报同样的错，如何解决?
¥15 关于使用unity中遇到的问题
¥15 开放世界如何写线性关卡的用例(类似原神）
¥15 关于并联谐振电磁感应加热
¥60 请查询全国几个煤炭大省近十年的煤炭铁路及公路的货物周转量
¥15 请帮我看看我这道c语言题到底漏了哪种情况吧！
¥60 关机时蓝屏并显示KMODE_EXCEPTION_NOT_HANDLED，怎么修？
¥66 如何制作支付宝扫码跳转到发红包界面

维基百科API只返回一小组数据？

1条回答 默认 最新

悬赏问题

1条回答默认最新