如何从大于可用RAM的xml文件中删除xml元素/节点？

I'm trying to figure out how to delete an element (and its children) from a xml file that is very large in php (latest version).

I know I can use dom and simpleXml, but that will require the document to be loaded into memory.

I am looking at the XML writer/reader/parser functions and googling, but there seems to be nothing on the subject (all answers recommend using dom or simpleXml). That cannot be correct--am I missing something?

The closest thing I've found is this (C#):

You can use an XmlReader to sequentially read your xml (ReadOuterXml might be useful in your case to read a whole node at a time). Then use an XmlWriter to write out all the nodes you want to keep. ( Deleting nodes from large XML files )

Really? Is that the approach? I have to copy the entire huge file?

Is there really no other way?

One approcah

As suggested,

I could read the data using phps XML reader or parser, possibly buffer it, and write/dump+append it back to a new file.

But is this approach really practical?

I have experience with splitting huge xml files into smaller pieces, basically using suggested method, and it took a very long time for the process to finish.

My dataset isn’t currently big enough to give me an idea on how this would work out. I could only assume that the results will be the same (a very slow process).

Does anybody have experience of applying this in practice?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doulangdang9986 2012-08-11 22:02
关注
There are a couple ways to process large documents incrementally, so that you do not need to load the entire structure into memory at once. In either case, yes, you will need to write back out the elements that you wish to keep and omit those you want to remove.

PHP has an XMLReader implementation of a pull parser. An explanation:

A pull parser creates an iterator that sequentially visits the various elements, attributes, and data in an XML document. Code which uses this iterator can test the current item (to tell, for example, whether it is a start or end element, or text), and inspect its attributes (local name, namespace, values of XML attributes, value of text, etc.), and can also move the iterator to the next item. The code can thus extract information from the document as it traverses it.

Or you could use the SAX XML Parser. Explanation:

Simple API for XML (SAX) is a lexical, event-driven interface in which a document is read serially and its contents are reported as callbacks to various methods on a handler object of the user's design. SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed.

A lot of people prefer the pull method, but either meets your requirement. Keep in mind that large is relative. If the document fits in memory, then it will almost always be easier to use the DOM. But for really, really large documents that simply might not be an option.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

如何从大于可用RAM的xml文件中删除xml元素/节点？ php xml
2012-08-11 21:27

回答 1 已采纳 There are a couple ways to process large documents incrementally, so that you do not need to load
如何将嵌套的XML元素解组为字符串数组？ xml
2019-02-12 14:38

回答 1 已采纳 You can use the following struct schema to parse the given XML. type Results struct { Meta
如何在GO中获取总内存/ RAM？
2015-05-04 01:19

回答 2 已采纳 Besides runtime.MemStats you can use gosigar to monitor system memory.
JSD-2204-Lombok插件-MyBatis配置.xml文件-Day03
2022-07-26 21:58

程序猿 Monkey的博客 JSD-2204-Lombok插件-MyBatis配置.xml文件-Day03
PHP XML解析 - 它可以更快吗？ php xml
2015-07-29 13:46

回答 3 已采纳 Consider using an XML database (e.g. eXist or BaseX). At this sort of size, it will be much more e
使用PHP设置XML标记的名称空间 php xml
2014-11-12 13:24

回答 1 已采纳 The xmlns:* attributes are namespace definitions (not libraries). The value of that attributes is
在ram中写入csv并将输出文件写入用户[php] php
2016-03-17 11:27

回答 1 已采纳 Just send the file for download and then delete it. Add this right after your code: // Send corre
java批量解析xml文件_java-扩展应用程序,可读取大型XML文件
2021-03-11 15:34

YiZhou0307的博客举一个简单的例子,如果批量加载,则100MB XML至少要消耗200MB RAM,因为每个字符都会立即扩展为16位字符.接下来,您不使用的任何元素标签都将消耗额外的内存(加上节点的所有其他负担和簿记),这一切都是浪费的.如果要...
循环时无法获取所有数据xml html mysql php xml
2017-09-19 09:40

回答 2 已采纳 There are a couple of problems, the main one is being hidden by your use of @ which is hiding any
Verilog如何实现同一个RAM中存放不同的数据？ fpga开发嵌入式硬件开发语言
2022-02-17 14:38

回答 2 已采纳图文看得我有点疑惑，ram的数据位宽明显是14位啊，8位最大数值只能到255。题主你想表达的是不是，一个数据位宽为14bit的ram，8个一组读写（一组数据位宽为14*8=112bit），共2268
正则表达式选择特定的html元素[Curl / PHP] html php
2018-05-21 19:54

回答 1 已采纳 do not parse HTML with regex. use a proper HTML parser instead, like DOMDocument. $domd = @DOMDo
xmlreader php 读取某节点,PHP XMLReader正在获取父节点?
2021-03-29 08:14

weixin_39863161的博客长版本:PHP的XMLReader被称为拉解析器优点是,如果您有一个75MB的XML文件,则不需要75MB的空闲RAM来处理它(就像使用基于树的解析器一样)。取而代之的是拉式解析器永远不会有整个文档的上下文。它们唯一拥有的是它们...
当其他类使用它时，哪个时候php中的一个类中的常量变量被加载到RAM中？ php
2017-09-21 21:04

回答 2 已采纳 a.php is read, parsed and executed when $someVar = A::SOME_VAR; is executed inside b.php. That is
dom4j 解析xml 并且进行增加，删除操作后重新生成xml文件
2017-07-16 18:49

齐悦高飞的博客这些代码实现的功能主要是把xml文件读出来，进行节点的添加，删除，然后再重新生成新的xml文件覆盖原来的xml文件 因为这次吧主要是因为ckfinder这个B插件，我得动态的改变他的ckfinder.xml配置文件所以才出此...
xml实体引用攻击php,XML 实体扩展攻击
2021-03-23 21:59

weixin_39543652的博客这种攻击基于XML Entity Expansion实现，经过在XML的DOCTYPE中建立自定义实体的定义实现，好比，这种定义能够在内存中生成一个比XML的原始容许大小大出不少的XML结构，来使这种攻击得以耗尽网络服务器正常有效...
xml和lxml库
2022-06-14 16:42

Generalzy的博客在XML中，不同的应用程序或组织可能使用相同的元素名或属性名来表示不同的数据，这样会导致数据混淆和解析困难。遵循XML的命名规则是非常重要的，因为命名在XML文档中用于标识数据元素和属性，如果不遵循规则，可能...
没有解决我的问题, 去提问

悬赏问题

¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场部分对应不上
¥15 如何在scanpy上做差异基因和通路富集？
¥20 关于#硬件工程#的问题，请各位专家解答！
¥15 关于#matlab#的问题：期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707，使系统具有较小的超调量
¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
¥30 截图中的mathematics程序转换成matlab
¥15 动力学代码报错，维度不匹配
¥15 Power query添加列问题
¥50 Kubernetes&Fission&Eleasticsearch
¥15 報錯：Person is not mapped，如何解決？

如何从大于可用RAM的xml文件中删除xml元素/节点？

One approcah

1条回答 默认 最新

悬赏问题

1条回答默认最新