dongwei1895 2015-10-11 18:28
浏览 23

wiktionary - 我无法获得当地的工作副本

I'll be the first to admit I'm not the smartest person in the world, but I'm at a loss on this one.

I want to have access to the words and details of each word of the English Wiktionary project. I saw they do data dumps, and got all excited. That lasted all of 3 seconds. Since then, all I've done is swear and smoke in bouts of frustration and irritation.

I'm using windows 7.
I've installed the latest version of xampp (64 bit, installed at root).
I've installed the latest Java DK.
I've set Xampp and JDK to run as admin.
I've grabbed the article-pages files.
I've decompressed them.
I've used the mwxml2sql tool.
I couldn't get it to run (no matter what settings/flags I tried).
I used the GUI version of the mwxml2sql tool.
It ran - and then errored at 4300 rows.
The error was about duple keys in name_title.

I've looked at wikokit - but that seems a few years behind.

I'm at a loss.

I've looked at the data that did get into the DB before the dupe-key error.
I can see some data in Blob format.
How am I meant to access that information via php?

Is there not a decent (as in "idiots" :D) guide for this?
Do I really have to grab all the files, install a wiki, parse the files?
How am I meant to handle the dupe key issues (not like I can open up the sql file and find the relevant line!)?

So, please - has anyone done this or know of a way to do it?
The only thing I can think of is to actually try and scrape the site - which I'd rather not do (and nor would the wiki group).

In case it is relevant - I'm specifically after the word-form, the PoS, the pronunciations, the definitions, any phrases and related words. Things like etymology etc. would be nice, but aren't as important.

If it is suggested, yes, I've looked at WordNet (managed to find a mysql dump, and got that working). I've also seen resources like MRC and the CMU dict - but none have the right permissions. That's why Wiktionary looked so attractive. But it seems the format/dumps are far from friendly :(

So, any help or ideas ? Alternative sources, guides, walk-through ... all would help.
Alternatively, if you can tell me what is causing the error and how to get around it, and how to access the word data, that would be superb.

Sincerley yours - frustrated.

  • 写回答

1条回答 默认 最新

  • dongmi1663 2015-10-12 09:12
    关注

    I've looked at wikokit - but that seems a few years behind.

    No, wikokit project is alive :) link: https://github.com/componavt/wikokit

    You can download the parsed English Wiktionary database: http://whinger.krc.karelia.ru/soft/wikokit/index.html Upload the SQL dump file to MySQL and play with definitions, synonyms, and translations extracted from the English Wiktionary.

    评论

报告相同问题?

悬赏问题

  • ¥15 BP神经网络控制倒立摆
  • ¥20 要这个数学建模编程的代码 并且能完整允许出来结果 完整的过程和数据的结果
  • ¥15 html5+css和javascript有人可以帮吗?图片要怎么插入代码里面啊
  • ¥30 Unity接入微信SDK 无法开启摄像头
  • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
  • ¥20 cad图纸,chx-3六轴码垛机器人
  • ¥15 移动摄像头专网需要解vlan
  • ¥20 access多表提取相同字段数据并合并
  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算