dongshen3352 2012-01-19 21:56
浏览 69
已采纳

如何获得复杂的维基百科模板的结果?

This is a question that is a bit hard to follow but I will do my best explaining it. First, let me present an example page:

http://en.wikipedia.org/wiki/African_bush_elephant

That's a wikipedia page, a specie page in particular since it has the 'taxobox' to the right. I'm trying to parse the attributes in that taxobox using PHP. There's two ways in Wikipedia to create such a taxobox: manually, or by using the special "auto taxobox" template.

I can parse the manual one. I use Wikipedia's API to return the page's content in json format, next I use some regular expressions to get those properties.

In the case of an auto taxobox, however, the content returned is like this:

> {{automatic taxobox | name = African Bush Elephant<ref
> name=MSW3>{{MSW3 Proboscidea | id = 11500009 | page =
> 91}}</ref> | status = VU | status_system = iucn3.1 | status_ref
> = <ref name=IUCN>{{IUCN2010|assessors=Blanc, J.|year=2008|version=2010.1|id=12392|title=Loxodonta
> africana|downloaded=04 April 2010}}</ref> | trend = unknown |
> image = African Bush Elephant.jpg | taxon = Loxodonta africana |
> synonyms = ''Loxodonta africana africana'' | binomial = ''Loxodonta
> africana'' | binomial_authority = ([[Johann Friedrich
> Blumenbach|Blumenbach]], 1797) }}

If you'd compare this with the actual page as you would see it on Wikipedia, you'll notice several attributes are missing. For example, the property "Kingdom" is visible on the real page but not returned here. There's more properties missing like that.

This is like due to the template needing Wikipedia's server side command to transform the template into actual output. I learned that the API has an "expandtemplates" action, which you can send a snippet like the one above, and you'll get the results returned as the user would see it. I'm using this for several templates and it works, but unfortunately not for the auto taxobox template. Click this link to see what expandtemplates returns:

complete link

As you can see, the template doesn't actually expand. Instead, it shows more templates, nested and repeated several times.

So now I'm stuck trying to read these properties from pages that have the auto taxobox template. The only other direction I can think of is to not use the API and to just parse the html of the actual page. That would be doable for some properties, but others are extremely fragile to parse.

  • 写回答

3条回答 默认 最新

  • dousou1967 2012-01-20 00:20
    关注

    Use action=parse instead of action=expandtemplates. As you've noticed, expandtemplates only expands a single level; additionally, it won't fully preprocess input (e.g, it won't successfully handle certain variable references inside templates).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 unity第一人称射击小游戏,有demo,在原脚本的基础上进行修改以达到要求
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line