PHP和XML：如何删除“终端元素”之外的所有空格

First let's define "terminal element" (for the particular purpose of this question).

By "terminal element" I mean the elements that contain no other elements inside.

Element reference: http://www.w3schools.com/xml/xml_elements.asp

How to remove from a XML document/node all whitespaces (line feeds, carriage returns, tabs and spaces) that are outside "terminal elements" with PHP?

Rules: Only PHP native XML parsers (no regex).

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dqu92800 2015-03-06 18:19
关注
All whitespace outside "terminal elements" (leaf element nodes) is within text-nodes (as all text is within text-nodes). So if you get all text-nodes that are outside of terminal elements, you can remove all whitespace-characters from those. This is the answer already.

Let's start lightly by just removing whitespace from one text-node in an XML Document.

As PHP uses UTF-8 as character encoding for the XML parsers (I use DOMDocument in this example), preg_replace is handy here as it knows both UTF-8 and what whitespace characters are:

/** @var DomText $text */ $text->nodeValue = preg_replace('~\s+~u', '', $text->textContent);

This removes all whitespace from a text-node. Here is a demonstration of that:

$doc = new DOMDocument(); $doc->loadXML('<root> Very Simple Demo </root>'); $text = $doc->documentElement->childNodes->item(0); /** @var DomText $text */ $text->nodeValue = preg_replace('~\s+~u', '', $text->textContent); $doc->save('php://output');

Output:

<?xml version="1.0"?> <root>VerySimpleDemo</root>

As you can see the space characters are removed from the one and only text-node that is part of that document.

With a larger document and your "terminal elements", this is naturally more interesting, but works pretty much the same. The only difference is to get all text-node that are not part of leaf-element-nodes. This is best done with an xpath query:

//*[*]/text()

This reads: All text-nodes that are children of element that contain other elements. Let's use the following XML (file content.xml) as an example:

<?xml version="1.0"?> <content> <parent> <child id="1"> <title>child 1</title> <child id="1"> <title> child 1.1 with whitespace </title> </child> </child> </parent> </content>

It contains both such leaf-elements as well as other elements that have child-elements. It also shows pretty well the whitespace as it's used for element indentation.

After loading it:

$file = __DIR__ . '/content.xml'; $doc = new DOMDocument(); $doc->load($file);

A DOMXPath is necessary to execute the xpath-query:

$xp = new DOMXPath($doc); $texts = $xp->query('//*[*]/text()');

What's left is to iterate over all those text-nodes and apply the whitespace removal as above:

foreach ($texts as $text) { /** @var DomText $text */ $text->nodeValue = preg_replace('~\s+~u', '', $text->textContent); }

The result then is:

<?xml version="1.0"?> <content><parent><child id="1"><title>child 1</title><child id="1"><title> child 1.1 with whitespace </title></child></child></parent></content>

This should answer the question. But it wouldn't be XML if there wouldn't be a little bit more verbosity or a little kind of "but...".

Note that "text()" in xpath represents all kind of text-nodes incl. CDATA sections. If a CDATA section contains of whitespace only, the code above renders an empty CDATA section ("<![CDATA[]]>") into the output. One way to deal with that is to remove the the empty nodes from the document:

/** @var DomText $text */ $text->nodeValue = preg_replace('~\s+~u', '', $text->textContent); if (!$text->length) { $text->parentNode->removeChild($text); }

This then removes all emptied text-nodes form the document then. Keeping the document tree tidy. Hope this helps.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

PHP和XML：如何删除“终端元素”之外的所有空格 php xml
2014-09-05 22:38

回答 2 已采纳 All whitespace outside "terminal elements" (leaf element nodes) is within text-nodes (as all text
避免使用PHP在xml节点中删除标记元素 php xml
2017-11-23 14:36

回答 1 已采纳 Just use asXML method to get tree with root at element found foreach($paras as $para) { echo $
使用PHP获取特定的XML元素值 php xml
2017-12-10 15:36

回答 2 已采纳 You may be better off using SimpleXML, you can then access the various components in a more intuit
php 显示全部xml内容,php,xml_怎样输出XML所有的同名节点内容?，php,xml - phpStudy
2021-04-27 00:41

Minitab Users Group的博客怎样输出XML所有的同名节点内容?现有的PHP代码输出XML节点时，只能输出第一个同名节点内容"100". 请问怎样才能输出所有节点的内容呢?现有代码：/* $xmlstring 原内容:-id*cn100101102103105*/$xmldoc = new DOM...
PHP - 解析具有命名空间元素的xml php xml
2018-01-25 00:58

回答 2 已采纳 First of all, soap and namespaces just make parsing XML harder than it has to be. I've never parse
XML编码：混合属性和元素 xml
2019-01-09 08:52

回答 1 已采纳 You may define your struct like: type foo struct { XMLName xml.Name `xml:"root"` Element
如何删除XML元素和所有子元素？ php xml
2013-04-06 06:06

回答 3 已采纳 I figured it out after a good night of sleep. It was quite simple actually. $xml = simplexml_load
php 终端音乐,PHP实现微信公众平台音乐点播
2021-03-23 13:12

太远有一点点的博客 1.构造微信提供的XML格式的音乐消息音乐的XML格式是这样的：12345678我们写个函数将数据转换成这样的格式，这里面主要的数据是MusicUrl和HQMusicUrl里面的数据，前者普通品质的音乐，后面HQ是高品质的，wifi下会优先...
需要一个php解析的xml格式的类 php xml
2023-02-21 10:14

回答 2 已采纳回答不易求求您采纳哦可以使用PHP内置的SimpleXML库来解析XML数据。以下是一个示例代码，用于解析你提供的XML格式数据： $xml = simplexml_load_string($
PHP按属性删除XML节点 php xml
2018-05-29 04:17

回答 2 已采纳 You can fetch the node using Xpath. It allows to fetch the matching node(s) directly. $id = '0450
PHP：从PHP更新XML并保存 php xml
2015-10-20 11:00

回答 3 已采纳 You COPY the scalar value (string) into a variable. Then you change the variable. $qtyNode =
代码重复率PHP,终端代码重复率检测实践
2021-05-07 03:50

雪渚的博客基本概念在《Software Clone Detection and Refactoring》一文中，对重复代码的类型进行了定义：完全一致的代码或者只修改了空格和评论结构上和句法上一致的代码，例如只是修改了变量名插入和删除了部分代码 ...
使用PHP将XML结构作为另一个XML元素的子元素插入 php xml
2016-09-16 10:32

回答 1 已采纳 You can't with SimpleXML, but if you really need manipulate your DOM or create it from scratch con
Linux:安装Redis和PHP操作Redis
2017-02-25 17:15

八点博客（钏）的博客一． Redis ...和 ...前台启动服务：始终有一个终端脚本被挂起执行 ...终端脚本被关闭后立即停止服务，不推荐 ...每进一个新元素，就删除一个权值最低的元素 ( 保证集合中只有 5 个元素 ) ： ...
php反序列化总结
2022-04-08 20:49

拓海AE的博客 php反序列化总结基础知识序列化序列化就是将对象object、字符串string、数组array、变量转换成具有一定格式的字符串，方便保持稳定的格式在文件中传输，以便还原为原来的内容。 serialize ( mixed $value ) : ...
没有解决我的问题, 去提问

悬赏问题

¥20 ue5运行的通道视频都会有白色锯齿
¥20 用雷电模拟器安装百达屋apk一直闪退
¥15 算能科技20240506咨询（拒绝大模型回答）
¥15 自适应 AR 模型参数估计Matlab程序
¥100 角动量包络面如何用MATLAB绘制
¥15 merge函数占用内存过大
¥15 Revit2020下载问题
¥15 使用EMD去噪处理RML2016数据集时候的原理
¥15 神经网络预测均方误差很小但是图像上看着差别太大
¥15 单片机无法进入HAL_TIM_PWM_PulseFinishedCallback回调函数

PHP和XML：如何删除“终端元素”之外的所有空格

2条回答 默认 最新

悬赏问题

2条回答默认最新