替代正则表达式获取xml标记的内容

I'm processing a XML file and I need to get all content inside <section> tags.

Right now I'm using this regex:

<?php preg_match_all('/<section[^>]*>(.*?)<\/section>/i', $myXmlString, $results);?>

The code inside the <section> tags is pretty complex. It include math equations and stuff like that. In my local machine the regex works perfect. It is php 5.3.10 over apache 2.2.22 (Ubuntu)

BUT in my staging server it doesn't work. It is php 5.3.3 over apache 2.2.15 (Red Hat)

I would ask 2 questions:

Is there any issue with preg_match_all for php 5.3.3?

Is there a better way to express the regex?

--EDIT: VARIATIONS OF REGEX USED UNSUCCESSFULY--

<?php preg_match_all('/<section[^>]*>(.*?)<\/section>/is', $myXmlString, $results);?>
<?php preg_match_all('/<section[^>]*>(.*?)<\/section>/ims', $myXmlString, $results);?>
<?php preg_match_all('#<section[^>]*>(.*?)<\/section>#ims', $myXmlString, $results);?>
<?php preg_match_all('#<section[^>]*>([^\00]*?)<\/section>#ims', $myXmlString, $results);?>

--EDIT: Why haven't I used a parser?

The XML consists of two <sections>. Each section groups n questions for an exam.

Each question can include math equations represented by its own XML. An equation may be something like this:

<inlineequation><m:math baseline="-16.5" display="inline" overflow="scroll"><m:mrow><m:mtable columnalign="left"><m:mtr><m:mtd><m:mrow><m:mo stretchy="true">[</m:mo><m:mrow><m:mtable columnalign="right"><m:mtr><m:mtd><m:mn>4</m:mn></m:mtd><m:mtd columnalign="right"><m:mrow><m:mo>-</m:mo><m:mn>9</m:mn></m:mrow></m:mtd><m:mtd columnalign="right"><m:mrow><m:mn>54</m:mn></m:mrow></m:mtd></m:mtr><m:mtr><m:mtd columnalign="right"><m:mrow><m:mo>&minus;</m:mo><m:mn>28</m:mn></m:mrow></m:mtd><m:mtd columnalign="right"><m:mo>&minus;</m:mo><m:mn>1</m:mn></m:mtd><m:mtd columnalign="right"><m:mo>&minus;</m:mo><m:mn>14</m:mn></m:mtd></m:mtr></m:mtable></m:mrow><m:mo stretchy="true">]</m:mo></m:mrow></m:mtd></m:mtr></m:mtable></m:mrow></m:math></inlineequation>

I need that code to remain XML (no array) because I will pass that code as it is to a jQuery plugin which will render the equation (it will look like LaTeX equations).

If I parse the XML it will be really difficult to create the string for the equation again and locate it in the right place inside the question's statement.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doushi5752 2014-01-22 03:00
关注
regex can be resource intensive.

perhaps consider using xml_parse_into_struct;

<?php $xmlp = xml_parser_create(); xml_parse_into_struct($xmlp, $myXmlString, $vals, $index); xml_parser_free($xmlp); print_r($vals); ?>
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

用正则表达式获取src里面的内容 java 正则表达式
2018-09-14 03:19

回答 1 已采纳简单的做法是，修改一下查找src的正则，在前面加一个\s，因为标签的属性前面是使用客格作为分隔的，这里必然有一个\s. 同样的，你data-src也建议加上这个\s
关于python爬虫利用正则表达式爬取不到内容的问题 python 数据挖掘机器学习正则表达式
2019-10-18 08:20

回答 3 已采纳爬虫内容解析比较方便的不是正则而是 xpath ，语法也很容易，建议试试这种： ``` from lxml import etree # 解析页面的模块 html = etree.HTML
C++正则表达式获取字符串中的汉字 c++ 正则表达式
2022-07-21 15:56

回答 1 已采纳 [\u4e00-\u9fa5]这就就代表中文
XML&正则表达式
2021-08-10 07:56

为什么暴躁的博客 XML（EXtensible Markup Language）可扩展的标记语言。主要用于数据交换。在HTML发展过程中，由于游览器厂商的恶性竞争，都在兼容不规范的写法，以用来吸引开发者。这与W3C的初衷相违背。于是W3C制定了XML标准，想...
正则表达式怎么提取可能出现的字符但不包括这个字符的内容正则表达式
2022-03-08 18:56

回答 2 已采纳 (?<=(x-main=")|(x-main=))[^"].*?(?=("|$))
求一个php正则表达式 php 正则表达式
2022-01-23 19:47

回答 1 已采纳试试这个import repattern = re.compile (r'(?:money=)\d+.?\d*')pattern.findall(string)
正则表达式获取Script中的Json字符串 c# javascript 正则表达式
2022-11-10 14:23

回答 4 已采纳 using System; using System.Text.RegularExpressions; namespace HelloWorldApplication { class Hell
PHP正则表达式大全
2019-12-05 10:05

fuck_life的博客 ## PHP正则表达式 ** ****1 数字： ^[0-9]*$ 2 n位的数字： ^\d{n}$ 3 至少n位的数字： ^\d{n,}$ 4 m-n位的数字： ^\d{m,n}$ 5 零和非零开头的数字： ^(0|[1-9][0-9]*)$ 6 非零开头的最多带两位小数的数字： ^([1-9...
想使用正则表达式匹配，提取文本中特定的内容。 python 正则表达式
2022-01-19 16:23

回答 2 已采纳这应该就是你想要的功能： import os, re def GetMiddleStr(content,startStr,endStr): '''提取字符串content当中，startStr
正则表达式如何写，在一段字符串中提取指定的内容。 python 正则表达式
2022-05-03 20:38

回答 8 已采纳 import re text = """福建省2022年道路交通事故人身损害赔偿相关数据【福建一般地区（除厦门外）】 1、全省城镇居民人均年可支配收入 51140元2、全省农村居民人均年可支配收
C#正则表达式查找非纯数字的字符 c# 正则表达式
2022-04-27 01:53

回答 6 已采纳 (([a-zA-Z_])([a-zA-Z0-9_])+)|(([0-9])([a-zA-Z_])+)
21-Java-XML&正则表达式
2021-08-09 21:53

Hannya。的博客 XML（EXtensible Markup Language）可扩展的标记语言。主要用于数据交换。在HTML发展过程中，由于游览器厂商的恶性竞争，都在兼容不规范的写法，以用来吸引开发者。这与W3C的初衷相违背。于是W3C制定了XML标准，想...
js中用正则表达式筛选ab之间的内容该怎么写 javascript
2022-06-30 08:26

回答 2 已采纳 var str = "afchjgjhkhjdeb";str = str.match(/a(\S*)b/)[1];alert(str);
PHP 最全的正则表达式
2019-11-15 11:12

半碗面的博客一、校验数字的表达式 1 数字： ^[0-9]*$ 2 n位的数字： ^\d{n}$ 3 至少n位的数字： ^\d{n,}$ 4 m-n位的数字： ^\d{m,n}$ 5 零和非零开头的数字： ^(0|[1-9][0-9]*)$ 6 非零开头的最多带两位小数的数字： ^([1-9][0...
【笔记】正则表达式
2022-01-19 20:49

ZhShy23的博客文章目录一、元字符二、反义字符三、限定字符四、转义字符五、字符分枝六、字符分组七、懒惰匹配和贪婪匹配八、后向引用九、其他语法十、校验数字的表达式十一、校验字符的表达式十二、特殊需求表达式一、元字符元...
没有解决我的问题, 去提问

悬赏问题

¥50 永磁型步进电机PID算法
¥15 sqlite 附加（attach database）加密数据库时，返回26是什么原因呢？
¥88 找成都本地经验丰富懂小程序开发的技术大咖
¥15 如何处理复杂数据表格的除法运算
¥15 如何用stc8h1k08的片子做485数据透传的功能？(关键词-串口)
¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗？
¥200 uniapp长期运行卡死问题解决
¥15 latex怎么处理论文引理引用参考文献
¥15 请教：如何用postman调用本地虚拟机区块链接上的合约？
¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题：[h264 @ 000000004faf7500]no frame？

替代正则表达式获取xml标记的内容

2条回答 默认 最新

悬赏问题

2条回答默认最新