dongrou5254 2017-04-19 07:55
浏览 51
已采纳

list = allpages不提供所有页面

i have the problem, that i want to fill a list with the names of all pages in my wiki. My script:

$TitleList = [];
$nsList = [];

$nsURL= 'wiki/api.php?action=query&meta=siteinfo&   siprop=namespaces|namespacealiases&format=json';
$nsJson = file_get_contents($nsURL);
$nsJsonD = json_decode($nsJson, true);
foreach ($nsJsonD['query']['namespaces'] as $ns)
{
  if ( $ns['id'] >= 0 )
    array_push ($nsList, $ns['id']);    
}

# populate the list of all pages in each namespace
foreach ($nsList as $n)
{
  $urlGET = 'wiki/api.php?action=query&list=allpages&apnamespace='.$n.'&format=json';
  $json = file_get_contents($urlGET);
  $json_b = json_decode( $json ,true); 

  foreach  ($json_b['query']['allpages'] as $page)
  {    
    echo("
".$page['title']);
    array_push($TitleList, $page["title"]);
  }
}

But there are still 35% pages missing, that i can visit on my wiki (testing with "random site"). Does anyone know, why this could happen?

  • 写回答

1条回答 默认 最新

  • duandun3178 2017-04-24 06:24
    关注

    MediaWiki API doesn't return all results at once, but does so in batches. A default batch is only 10 pages; you can specify aplimit to change that (500 max for users, 5,000 max for bots).

    To get the next batch, you need to specify the continue= parameter; in each batch, you will also get a continue property in the returned data, which you can use to ask for the next batch. To get all pages, you must loop as long as a continue element is present.

    For example, on the English Wikipedia, this would be the first API call: https://en.wikipedia.org/w/api.php?action=query&list=allpages&apnamespace=0&format=json&aplimit=500&continue=

    ...and the continue object will be this: "continue":{ "apcontinue":"\"Cigar\"_Daisey", "continue":"-||" }

    (Updated according to comment by OP, with example code)

    You would now want to flatten the continue array into url parameters, for example using `

    See the more complete explanation here: https://www.mediawiki.org/wiki/API:Query#Continuing_queries

    A working version of your code should be (tested with Wikipedia with a slightly different code):

    # populate the list of all pages in each namespace
    
      $baseUrl = 'wiki/api.php?action=query&list=allpages&apnamespace='.$n.'&format=json&limit=500&'; // Increase limit if you are using a bot, up to 5,000
    foreach ($nsList as $n) {
      $next = '';
      while ( isset( $next ) ) {
        $urlGET = $baseUrl . $next;
        $json = file_get_contents($urlGET);
        $json_b = json_decode($json, true);
        foreach  ($json_b['query']['allpages'] as $page)
        {
          echo("
    ".$page['title']);
          array_push($TitleList, $page["title"]);
        }
    
        if (isset($json_b['continue'])) {
          $next = http_build_query($json_b['continue']);
        }
      }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 win10权限管理,限制普通用户使用删除功能
  • ¥15 minnio内存占用过大,内存没被回收(Windows环境)
  • ¥65 抖音咸鱼付款链接转码支付宝
  • ¥15 ubuntu22.04上安装ursim-3.15.8.106339遇到的问题
  • ¥15 求螺旋焊缝的图像处理
  • ¥15 blast算法(相关搜索:数据库)
  • ¥15 请问有人会紧聚焦相关的matlab知识嘛?
  • ¥15 网络通信安全解决方案
  • ¥50 yalmip+Gurobi
  • ¥20 win10修改放大文本以及缩放与布局后蓝屏无法正常进入桌面