使用正则表达式将标记解析为抽象语法树

This question is supplementary to: Recursive processing of markup using Regular Expression and DOMDocument

The code supplied by the selected answer has been a great help to understand building a basic syntax tree. However I am now having troubles tightening the regular expressions to only match my syntax rather than {. but not {{. Ideally I would like it to only match my syntax which is:

{<anchor>}
{!image!}
{*strong*}
{/emphasis/}
{|code|}
{-strikethrough-}
{>small<}

Two tags, a and small also require differing end tags. I have tried modifying $re_closetag from the original code sample to reflect this but it still matches too much as text.

For example:

http://www.google.com/>} bang 
smäll<} boom

My test string is:

tëstïng {{ 汉字/漢字 }} testing {<http://www.google.com/>} bang {>smäll<} boom {* strông{/ ëmphäsïs {- strïkë {| côdë |} -} /} *} {*wôw*} 1, 2, 3

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongwen9975 2013-04-10 20:01
关注
You can either control this in the RE itself or after a match.

In the re, to control what tags may be "open" modify this part of $re_next:

(?:\{(?P<opentag>[^{\s])) # match an open tag #which is "{" followed by anything other than whitespace or another "{"

Currently it looks for any character which is not { or whitespace. Simply change to this:

(?:\{(?P<opentag>[<!*/|>-]))

Now it looks for only your specific open tags.

The close tag portion only matches a single character at a time depending on what tag is open in the current context. (This is what the $opentag argument is for.) So to match a pair of characters, simply change the $opentag to look for in the recursive call. E.g.:

if (isset($m['opentag']) && $m['opentag'][1] !== -1) { list($newopen, $_) = $m['opentag']; // change the close character to look for in the new context if ($newopen==='>') $newopen = '<'; else if ($newopen==='<') $newopen = '>'; list($subast, $offset) = str_to_ast($s, $offset, array(), $newopen); $ast[] = array($newopen, $subast); } else if (isset($m['text']) && $m['text'][1] !== -1) {

Alternatively, you can keep the RE as-is and decide what to do with the match after the fact. For example, if you match a @ character but {@ is not an allowed open tag, you can either raise a parse error or simply treat it as a text node (attaching array('#text', '{@') to the ast), or anything in between.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

使用正则表达式将标记解析为抽象语法树 php
2013-04-10 19:05

回答 1 已采纳 You can either control this in the RE itself or after a match. In the re, to control what tags ma
想使用正则表达式匹配，提取文本中特定的内容。 python 正则表达式
2022-01-19 16:23

回答 2 已采纳这应该就是你想要的功能： import os, re def GetMiddleStr(content,startStr,endStr): '''提取字符串content当中，startStr
使用正则表达式提取文本数据，正则表达式如何写 python 有问必答正则表达式爬虫
2021-10-25 18:26

回答 2 已采纳 regex = r"('gender':\s*{[^}]+})|('glasses':\s*{[^}]+})|('emotion':.+.jpg')" 不清楚是否你每个文件都是类似的，如果不行，再
字符串、格式化、正则表达式【下】
2014-10-08 14:34

CloudsYi的博客附带这个章节会讲到很多关于处理字符串格式的内容，包括使用正则表达式做验证以及使用日期、货币格式化处理，还会提及到的就是如果在使用JDBC的时候针对SQL的类型[java.sql包内]和针对Java的类型[java.util]的一些...
如何在正则表达式中使用变量？ javascript 前端正则表达式
2022-01-09 11:44

回答 1 已采纳 /regex\d/g您可以构造一个新的RegExp对象，而不使用语法：var replace = "regex\d";var re = new RegExp(replace,"g"); 您可以通过这种
java 正则表达式解析公式问题 java 有问必答正则表达式
2021-07-07 10:44

回答 3 已采纳 /(\-?[^\+\-\*\/]+)([\+\-])((?:[^\+\-\*/]|[-](?=[0-9]))+)/gi.exec('lineData(1,"debit”)+adjustHis("801
求一个php正则表达式 php 正则表达式
2022-01-23 19:47

回答 1 已采纳试试这个import repattern = re.compile (r'(?:money=)\d+.?\d*')pattern.findall(string)
[网络安全提高篇] 一一三.Powershell恶意代码检测 (1)论文总结及抽象语法树（AST）提取
2022-03-11 11:58

Eastmount的博客这是作者网络安全自学教程系列，主要是关于安全工具和实践操作的...这篇文章将详细讲解PowerShell、Powershell恶意代码检测总结及抽象语法树（AST）提取。希望这篇文章对您有帮助，也推荐大家去阅读论文，且看且珍惜。
为什么一用正则表达式就报错啊？ jquery 正则表达式
2022-01-11 21:09

回答 3 已采纳 retTitle没有值，把上面的ajax改为同步
使用java正则表达式匹配日期 java 正则表达式
2020-01-31 15:18

回答 1 已采纳 ``` ^\d{4}-0*((1|3|5|7|8|10|12)-0*([1-9]|[1-2]\d|3[0-1])|(4|6|9|11)-0*([1-9]|[1-2]\d|30)|2-0*([1-
请教一个PHP正则表达式的问题 php 有问必答正则表达式
2021-08-24 09:13

回答 2 已采纳这样？有帮助麻烦点个采纳【本回答右上角】，谢谢~~ <?php $s=<<<str 1.\$foo->\$bar['baz'] 主要想用两个正则表达式，放入编辑器以查询
javaWeb核心技术第四篇之Javascript第二篇事件和正则表达式
2019-09-21 17:59

weixin_30919235的博客 正则表达式,常用的正则表达式,js正则表达式,正则表达式数字,正则表达式空格 正则表达式（英文：Regular Expression）在计算机科学中，是指一个用来描述或者匹配一系列符合某个句法规则的字符串的单个字符串。正则...
C#正则表达式查找非纯数字的字符 c# 正则表达式
2022-04-27 01:53

回答 6 已采纳 (([a-zA-Z_])([a-zA-Z0-9_])+)|(([0-9])([a-zA-Z_])+)
php语法介绍,PHP语法介绍
2021-04-22 18:54

靳天羽的博客在PHP7中由于修改了底层引擎大部分代码，以及通过各种方式提高PHP的性能，所以在PHP7中也增加了一些新的语法，这些语法的使用也能对提高性能有帮助。下面给大家简单介绍一些，希望对你们有所帮助。1、标量参数类型...
PHP第三章到第七章
2022-06-12 21:41

可口口可的博客 php课堂笔记
没有解决我的问题, 去提问

悬赏问题

¥50 求解vmware的网络模式问题
¥24 EFS加密后，在同一台电脑解密出错，证书界面找不到对应指纹的证书，未备份证书，求在原电脑解密的方法，可行即采纳
¥15 springboot 3.0 实现Security 6.x版本集成
¥15 PHP-8.1 镜像无法用dockerfile里的CMD命令启动只能进入容器启动，如何解决？(操作系统-ubuntu)
¥30 请帮我解决一下下面六个代码
¥15 关于资源监视工具的e-care有知道的嘛
¥35 MIMO天线稀疏阵列排布问题
¥60 用visual studio编写程序，利用间接平差求解水准网
¥15 Llama如何调用shell或者Python
¥20 谁能帮我挨个解读这个php语言编的代码什么意思？

使用正则表达式将标记解析为抽象语法树

1条回答 默认 最新

悬赏问题

1条回答默认最新