doumeng1143 2013-04-10 19:05
浏览 203
已采纳

使用正则表达式将标记解析为抽象语法树

This question is supplementary to: Recursive processing of markup using Regular Expression and DOMDocument

The code supplied by the selected answer has been a great help to understand building a basic syntax tree. However I am now having troubles tightening the regular expressions to only match my syntax rather than {. but not {{. Ideally I would like it to only match my syntax which is:

{<anchor>}
{!image!}
{*strong*}
{/emphasis/}
{|code|}
{-strikethrough-}
{>small<}

Two tags, a and small also require differing end tags. I have tried modifying $re_closetag from the original code sample to reflect this but it still matches too much as text.

For example:

http://www.google.com/>} bang 
smäll<} boom 

My test string is:

tëstïng {{ 汉字/漢字 }} testing {<http://www.google.com/>} bang {>smäll<} boom {* strông{/ ëmphäsïs {- strïkë {| côdë |} -} /} *} {*wôw*} 1, 2, 3
  • 写回答

1条回答 默认 最新

  • dongwen9975 2013-04-10 20:01
    关注

    You can either control this in the RE itself or after a match.

    In the re, to control what tags may be "open" modify this part of $re_next:

    (?:\{(?P<opentag>[^{\s]))  # match an open tag
          #which is "{" followed by anything other than whitespace or another "{"
    

    Currently it looks for any character which is not { or whitespace. Simply change to this:

    (?:\{(?P<opentag>[<!*/|>-]))
    

    Now it looks for only your specific open tags.

    The close tag portion only matches a single character at a time depending on what tag is open in the current context. (This is what the $opentag argument is for.) So to match a pair of characters, simply change the $opentag to look for in the recursive call. E.g.:

            if (isset($m['opentag']) && $m['opentag'][1] !== -1) {
                list($newopen, $_) = $m['opentag'];
    
                // change the close character to look for in the new context
                if ($newopen==='>') $newopen = '<';
                else if ($newopen==='<') $newopen = '>';
    
                list($subast, $offset) = str_to_ast($s, $offset, array(), $newopen);
                $ast[] = array($newopen, $subast);
            } else if (isset($m['text']) && $m['text'][1] !== -1) {
    

    Alternatively, you can keep the RE as-is and decide what to do with the match after the fact. For example, if you match a @ character but {@ is not an allowed open tag, you can either raise a parse error or simply treat it as a text node (attaching array('#text', '{@') to the ast), or anything in between.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 求解vmware的网络模式问题
  • ¥24 EFS加密后,在同一台电脑解密出错,证书界面找不到对应指纹的证书,未备份证书,求在原电脑解密的方法,可行即采纳
  • ¥15 springboot 3.0 实现Security 6.x版本集成
  • ¥15 PHP-8.1 镜像无法用dockerfile里的CMD命令启动 只能进入容器启动,如何解决?(操作系统-ubuntu)
  • ¥30 请帮我解决一下下面六个代码
  • ¥15 关于资源监视工具的e-care有知道的嘛
  • ¥35 MIMO天线稀疏阵列排布问题
  • ¥60 用visual studio编写程序,利用间接平差求解水准网
  • ¥15 Llama如何调用shell或者Python
  • ¥20 谁能帮我挨个解读这个php语言编的代码什么意思?