使用PHP从MediaWiki数据库中提取压缩文本

A client of ours would like to have all the contents from a wiki site they ran for a while. They provided us the complete database of the 'mediawiki' software. We are trying to extract the articles from the 'text' table with php, without using the MediaWiki engine.

MediaWiki seems to zip the contents before putting it as a BLOB in the database. We can't find a way to extract it without the engine. I looked at the source code, but can't recreate how they extract the BLOB's.

Any suggestions how solve this?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

douxuan3095 2012-11-26 20:15

关注

From Text table:

old_flags

Comma-separated list of flags. Contains the following possible values:

┌──────────┬──────────────────────────────────────────────────────────────────┐
│ gzip     │ Text is compressed with PHP's gzdeflate() function.              │
│          │ Note: If the $wgCompressRevisions option is on, new rows         │
│          │ (=current revisions) will be gzipped transparently at save time. │
│          │ Previous revisions can also be compressed by using the script    │
│          │ compressOld.php                                                  │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ utf-8    │ Text was stored as UTF-8.                                        │
│          │ Note: If the $wgLegacyEncoding option is on, rows *without* this │
│          │ flag will be converted to UTF-8 transparently at load time.      │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ object   │ Text field contained a serialized PHP object.                    │
│          │ Note: The object either contains multiple versions compressed    │
│          │ together to achieve a better compression ratio, or it refers to  │
│          │ another row where the text can be found.                         │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ external │ Text was stored in an external location specified by old_text    │
└──────────┴──────────────────────────────────────────────────────────────────┘

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

从MediaWiki API调用中提取内容（XML，cURL） php
2010-09-13 08:04

回答 1 已采纳 Use PHP DOM to parse it. Do it like this: //you already have input text in $html $html = '<api
子类别未列在MediaWiki的父类别页面中 php
2017-12-19 18:02

回答 1 已采纳 I could resolve the issue following an advice from Ciencia Al Poder. I contribute the answer her
在安装过程中无法找到MediaWiki的数据库驱动程序 database mysql php
2010-11-23 08:59

回答 3 已采纳 Look in /php_install_dir/extensions. There should be a file named "php_mysql.dll". If so, enable i
人工智能大数据,公开的海量数据集下载
2019-09-19 09:37

人在^O^旅途的博客数据集的网站： 1、Public Data Sets on Amazon Web Services (AWS) ...Amazon从2008年开始就为开发者提供几十TB的开发数据。 2、Yahoo! Webscope http://webscope.sandbox.yahoo.com/index.php 3、Konect is a...
如何在MediaWiki中列出所有用户？ php
2015-03-18 11:55

回答 1 已采纳 You will have to query the user table in the databas. Something like this (have a look in the manu
使用DB数据生成MediaWiki表 php
2013-12-24 18:32

回答 1 已采纳 If you are calling this parser function from a template, you can use the magic word FULLPAGENAME:
MediaWiki文本表如何连接到类别表？ mysql php
2012-08-20 13:00

回答 1 已采纳 Edit: I'm still not sure if you are asking for the text of a category page or the text of the page
建立可在任何PC上访问的个人Wiki
2020-09-22 14:26

culul01313的博客借助TiddlyWiki和Dropbox，您可以设置易于使用的Wiki，并可以通过网络浏览器从任何内容中获取。这是入门方法。设置您的维基 (Set Up Your Wiki) TiddlyWiki isn’t like a traditional MediaWiki or Confluence ...
MediaWiki中的不同链接 php
2012-04-02 19:41

回答 2 已采纳 Mediawiki allows you to wrap html tags around links; you can set the default to not open a new ta
MediaWiki批量页面重命名 php
2016-04-11 10:18

回答 2 已采纳 I could finish the task. Here are the steps: Backup your database Execute this to export all pag
在MediaWiki上使用广告和横幅[关闭] php
2012-08-13 12:29

回答 1 已采纳 There are a number of existing advertisement extensions for MediaWiki that may suit your needs, or
人工智能大数据,公开的海量数据集下载,ImageNet数据集下载,数据挖掘机器学习数据集下载
2017-12-22 20:57

ytusdc的博客人工智能大数据,公开的海量数据集下载,ImageNet数据集下载,数据挖掘机器学习数据集下载 ImageNet挑战赛中超越人类的计算机视觉系统微软亚洲研究院视觉计算组基于深度卷积神经网络（CNN）的计算机视觉系统，在...
有用的工具续
2018-08-25 21:51

weixin_34014277的博客 https://github.com/xianyunyh/... http://yehe.37he.cn/job/#/weekline 在命令行中显示网页请求的调试信息 https://github.com/beyondcode... PHP 枚举类 https://github.com/limingxinl... 推送服务包...
人工智能大数据,公开的海量数据集下载,ImageNet数据集下载,数据挖掘机器学习数据集下载...
2016-10-13 16:19

weixin_30357231的博客人工智能大数据,公开的海量数据集下载,ImageNet数据集下载,数据挖掘机器学习数据集下载 ImageNet挑战赛中超越人类的计算机视觉系统微软亚洲研究院视觉计算组基于深度卷积神经网络（CNN）的计算机视觉系统，在...
Object Detection in 20 Years: A Survey 20年间的目标检测：综述
2022-05-16 12:22

程子的小段的博客本文根据其技术演变（从1990年代到2019年）广泛回顾了400多篇关于物体检测的论文。本文涵盖了许多主题，包括历史上的里程碑探测器，检测数据集，指标，检测系统的基本构建块，加速技术以及最近的最先进的检测方法。...
python代码案例详解-Python代码样例列表
2020-11-01 12:05

weixin_37988176的博客从日志文件中提取ip并找到归属地完成输出.py │ 使用Python完成访问同时下载网页内容的方法.py │ 分享冒泡排序与选择排序源码示例.py │ 初学python怎么用while循环笔记分享.py │ 可视化SVM分类器开源实现的...
python语言实例-Python代码样例列表
2020-11-01 12:04

weixin_37988176的博客从日志文件中提取ip并找到归属地完成输出.py │ 使用Python完成访问同时下载网页内容的方法.py │ 分享冒泡排序与选择排序源码示例.py │ 初学python怎么用while循环笔记分享.py │ 可视化SVM分类器开源实现的...
python代码示例-Python代码样例列表
2020-10-28 20:45

编程大乐趣的博客从日志文件中提取ip并找到归属地完成输出.py │ 使用Python完成访问同时下载网页内容的方法.py │ 分享冒泡排序与选择排序源码示例.py │ 初学python怎么用while循环笔记分享.py │ 可视化SVM分类器开源实现的...
提升逼格.Summary.提升逼格的那些运维开发资料汇总?
2019-09-18 10:41

chunnidong6528的博客开发相关 FLASK专区 awesome-flask https://github.com/humiaozuzu/awesome-flask 环境管理 p：非常简单的交互式 python 版本管理工具。官网 pyenv：简单的 Python 版本管理工具。...Vex：可以在虚拟环境中执...
Python相关及开发运维资料汇总
2019-08-29 13:54

luanxiyuan的博客开发相关 FLASK专区 ...环境管理 p：非常简单的交互式 python 版本管理工具。官网 pyenv：简单的 Python 版本... Vex：可以在虚拟环境中执行命令。官网 virtualenv：创建独立 Python 环境的工具。官网 virtualenv...
没有解决我的问题, 去提问

悬赏问题

¥15 求京东批量付款能替代天诚
¥15 slaris 系统断电后，重新开机后一直自动重启
¥15 51寻迹小车定点寻迹
¥15 谁能帮我看看这拒稿理由啥意思啊阿啊
¥15 关于vue2中methods使用call修改this指向的问题
¥15 idea自动补全键位冲突
¥15 请教一下写代码，代码好难
¥15 iis10中如何阻止别人网站重定向到我的网站
¥15 滑块验证码移动速度不一致问题
¥15 Utunbu中vscode下cern root工作台中写的程序root的头文件无法包含

码龄粉丝数原力等级 --

使用PHP从MediaWiki数据库中提取压缩文本

2条回答默认最新

码龄粉丝数原力等级 --

old_flags

悬赏问题

使用PHP从MediaWiki数据库中提取压缩文本

2条回答 默认 最新

old_flags

悬赏问题

2条回答默认最新