dongmaijie5200 2012-08-11 21:27
浏览 34
已采纳

如何从大于可用RAM的xml文件中删除xml元素/节点?

I'm trying to figure out how to delete an element (and its children) from a xml file that is very large in php (latest version).

I know I can use dom and simpleXml, but that will require the document to be loaded into memory.

I am looking at the XML writer/reader/parser functions and googling, but there seems to be nothing on the subject (all answers recommend using dom or simpleXml). That cannot be correct--am I missing something?

The closest thing I've found is this (C#):

You can use an XmlReader to sequentially read your xml (ReadOuterXml might be useful in your case to read a whole node at a time). Then use an XmlWriter to write out all the nodes you want to keep. ( Deleting nodes from large XML files )

Really? Is that the approach? I have to copy the entire huge file?

Is there really no other way?

One approcah

As suggested,

I could read the data using phps XML reader or parser, possibly buffer it, and write/dump+append it back to a new file.

But is this approach really practical?

I have experience with splitting huge xml files into smaller pieces, basically using suggested method, and it took a very long time for the process to finish.

My dataset isn’t currently big enough to give me an idea on how this would work out. I could only assume that the results will be the same (a very slow process).

Does anybody have experience of applying this in practice?

  • 写回答

1条回答 默认 最新

  • doulangdang9986 2012-08-11 22:02
    关注

    There are a couple ways to process large documents incrementally, so that you do not need to load the entire structure into memory at once. In either case, yes, you will need to write back out the elements that you wish to keep and omit those you want to remove.

    1. PHP has an XMLReader implementation of a pull parser. An explanation:

      A pull parser creates an iterator that sequentially visits the various elements, attributes, and data in an XML document. Code which uses this iterator can test the current item (to tell, for example, whether it is a start or end element, or text), and inspect its attributes (local name, namespace, values of XML attributes, value of text, etc.), and can also move the iterator to the next item. The code can thus extract information from the document as it traverses it.

    2. Or you could use the SAX XML Parser. Explanation:

      Simple API for XML (SAX) is a lexical, event-driven interface in which a document is read serially and its contents are reported as callbacks to various methods on a handler object of the user's design. SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed.

    A lot of people prefer the pull method, but either meets your requirement. Keep in mind that large is relative. If the document fits in memory, then it will almost always be easier to use the DOM. But for really, really large documents that simply might not be an option.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场 部分对应不上
  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?