使用PLY.yacc进行适当的解析策略

I am writing a PHP parser in PLY in order to teach myself the concepts of lexing/parsing.

I have the lexer tokens created for a very simple PHP code snippet but I am stuck on the proper way to parse.

Here is the code snippet I am trying to lex/parse:

  <?php if (isset($_REQUEST['name'])){
        $name = $_REQUEST['name'];
        $msg = "Hello, " . $name . "!";
        $encoded = htmlspecialchars($msg);
  }
  ?>

My goal is to trace the user-input to determine that is has indeed reached the htmlspecialchars() method. My current parsing strategy gets me as far as the parsing to line 2

$name = $_REQUEST['name'];

but I have no idea what the proper way to parse line 3:

$msg = "Hello, " . $name . "!";

The complication is that I will never be certain how many concatenations will take place on my user-input and I feel it is wrong to "hard code" just to successfully parse the example code. For example with this line I'm interested in the fact that the $msg variable includes my user-supplied data (from $name variable)

I have tried parsing this token in probably the worst possible way just to test if I could reach it but when I run my script it says WARNING: Symbol 'wrong' is unreachable

def p_wrong(p):
    '''wrong : VARIABLE EQUALS QUOTED_ENCAPSED_STRING DOT VARIABLE DOT QUOTED_ENCAPSED_STRING SEMICOLON'''
    print "wrong"

So I am hoping for guidance I how to understand how to parse line #3 in such a way that it won't matter how many concatenations or other operations take place on the variables I am tracing. I have a feeling this is where a lesson on BNF grammar or the wonderfully painful complexities of parsing will begin. But I want to learn I just don't know where to start.

Here is my complete code at this point:

import ply.lex as lex
import ply.yacc as yacc

string = """<?php if (isset($_REQUEST['name'])){
               $name = $_REQUEST['name'];
               $msg = "Hello, " . $name . "!";
               $encoded = htmlspecialchars($msg);
}
?>"""

delimeters = ('LPAREN', 'RPAREN', 'LBRACKET', 'RBRACKET')

tokens = delimeters + (
    "CHAR",
    "NUM",
    "OPEN_TAG",
    "CLOSE_TAG",
    "VARIABLE",
    "CONSTANT_ENCAPSED_STRING",
    "ENCAPSED_AND_WHITESPACE",
    "QUOTED_ENCAPSED_STRING",
    "LCURLYBRACKET",
    "RCURLYBRACKET",
    "EQUALS",
    "SEMICOLON",
    "QUOTE",
    "DOT",
    "IF"
)

t_ignore         = " \t"
t_CHAR           = r"[a-z]"
t_LPAREN         = r'\('
t_RPAREN         = r'\)'
t_RBRACKET       = r'\]'
t_LBRACKET       = r'\['
t_RCURLYBRACKET  = r'\}'
t_LCURLYBRACKET  = r'\{'
t_EQUALS         = r'='
t_SEMICOLON      = r';'
t_DOT            = r'\.'


def t_newline(t):
    r'
+'
    t.lexer.lineno += t.value.count("
")

def t_CONSTANT_ENCAPSED_STRING(t):
    r"'([^\\']|\\(.|
))*'"
    t.lexer.lineno += t.value.count("
")
    return t

def t_QUOTED_ENCAPSED_STRING(t):
    r"""\"([^\\"]|\\(.|
))*\""""
    t.lexer.lineno += t.value.count("
")
    return t

def t_OPEN_TAG(t):
    r'<[?%]((php[ \t
]?)|=)?'
    if '=' in t.value: t.type = 'OPEN_TAG_WITH_ECHO'
    t.lexer.lineno += t.value.count("
")
    return t

def t_CLOSE_TAG(t):
    r'[?%]>?
?'
    t.lexer.lineno += t.value.count("
")
    #t.lexer.begin('INITIAL')
    return t

def t_VARIABLE(t):
    r'\$[A-Za-z_][\w_]*'
    return t

def t_NUM(t):
    r"\d+"
    t.value = int(t.value)
    return t

def t_error(t):
    print t.lexer.current_state
    print dir(t.lexer)
    raise TypeError("unknown char '%s'"%(t.value))

lexer = lex.lex()

lex.input(string)
for tok in iter(lex.token, None):
    print repr(tok.type), repr(tok.value)


##now for the parsing

"""
$name = $_REQUEST['name'];
$msg = "Hello, " . $name . "!";    
"""

def p_assign(p):
    '''assign : VARIABLE EQUALS input'''
    print "assign rule"
    print p[1],p[2],p[3]
    p[0] = p[1]

def p_input(p):
    '''input : VARIABLE LBRACKET CONSTANT_ENCAPSED_STRING RBRACKET SEMICOLON
             | VARIABLE LBRACKET QUOTED_ENCAPSED_STRING RBRACKET SEMICOLON'''
    print "input rule"
    value =  p[1]+p[2]+p[3]+p[4]+p[5]
    p[0] = value

def p_wrong(p):
    '''wrong : VARIABLE EQUALS QUOTED_ENCAPSED_STRING DOT VARIABLE DOT QUOTED_ENCAPSED_STRING SEMICOLON'''
    print "wrong"    


yacc.yacc()
yacc.parse(string)

And the results:

...
WARNING: There is 1 unused rule
WARNING: Symbol 'wrong' is unreachable
Generating LALR tables
yacc: Syntax error at line 6, token=OPEN_TAG
input rule
assign rule
$name = $_REQUEST['name'];
yacc: Syntax error at line 8, token=VARIABLE

My (incorrect) attempt at parsing line 3 (with the format hard-coded in the parser rule p_wrong) doesn't even get hit. But I would just like some guidance on how to proceed to parse this simple code block.

Desired output

Ideally I will have results that allow me to trace the user input something like this:

user-input -> $name -> $msg -> htmlspecialchars($msg)

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

报告相同问题？

关注问题

如何使用php通过xml读取图像？ php xml
2014-01-09 13:33

回答 1 已采纳 the image is base64 encoded. just put the code inside the image src tag <img src="data:image/p
PCL读取PLY文件一直报std::bad_alloc错误 3d c++
2022-04-13 20:09

回答 1 已采纳我找到了解决方案：卸载pcl 1.12.1版本，安装pcl 1.12.0版本。
ply作图坐标轴刻度问题 python
2019-08-15 11:24

回答 1 已采纳调用df.plot方法时，如果不指定ax参数，每次都会生成一个新的图，所以你第二次不会线没了，是根本就没有数据换图，建议：用医用变量保存第一次的画图，然后对图进行设置修改： tmp=df.plt
Life is short.，You need Python
2018-11-01 14:59

weixin_34406086的博客灵感来自awesome-php。真棒Python 管理员面板算法和设计模式反垃圾邮件资产管理音频认证构建工具内置类增强功能高速缓存 ChatOps工具集群计算 CMS 代码分析命令行工具兼容...
（C++）ply转pcd格式，路径中没有保存相应的pcd文件 c++
2022-01-10 16:15

回答 2 已采纳找到问题了运行出结果后，我按了crtl+s保存后，直接用ctrl+c关闭了窗口界面如果要保存的话，在运行出来结果后，手动关闭窗口，文件会自动保存
matlab软件scatter3生产的图像如何保存为ply格式 matlab 开发语言
2022-02-06 15:26

回答 7 已采纳先安装 geom3d 里面的库函数 writeMesh_ply，使用即可
如何在PHP中的几个不同的键上对多维数组进行排序？ php
2014-12-03 21:30

回答 2 已采纳 Are you sure the array_multisort doesn't work? array_multisort( $pricing->d1d2, SORT_A
提升逼格.Summary.提升逼格的那些运维开发资料汇总?
2019-09-18 10:41

chunnidong6528的博客 PLY：lex 和 yacc 解析工具的 Python 实现。官网 Pygments：通用语法高亮工具。官网 pyparsing：生成通用解析器的框架。官网 python-nameparser：把一个人名分解为几个独立的部分。官网 python-user-...
DGCNN.pytorch在S3DIS上运行错误，如何解决？ pytorch 图像处理神经网络
2022-05-20 10:41

回答 3 已采纳 def test(args, io): all_true_cls = [] all_pred_cls = [] all_true_seg = [] all_pred_s
百度地图为什逆解析地址没反应 asp.net javascript jquery
2015-04-13 09:41

回答 1 已采纳解码是异步的， geo.getPoint(obj.value, function (point) {这个回调还没执行就已经执行 geo.getLocation(pt, function (rs) {
如何使用Golang Gorilla / mux托管并发Websocket连接？ reactjs websocket
2019-03-25 08:03

回答 1 已采纳 The design of your WebServer struct only allows for a single connection. What happens is that on
python第三方工具_Python经常使用第三方工具、库、骨架
2020-12-11 09:47

weixin_39732640的博客该库能进行图形格式的转换、打印和显示。还能进行一些图形效果的处理，如图形的放大、缩小和旋转等。是用户进行图象处理的强有力工具。matplotlib:一个Python的2D画图库。Pmw(Pythonmegawidgets)：它是超级GUI组件集...
各位如何让下面这段sql跑的快一些开发语言有问必答
2021-07-20 15:29

回答 3 已采纳你这是什么业务，为什么写的这么复杂。
python调用第三方软件_Python经常使用第三方工具、库、骨架
2020-11-28 23:40

weixin_39962199的博客该库能进行图形格式的转换、打印和显示。还能进行一些图形效果的处理，如图形的放大、缩小和旋转等。是用户进行图象处理的强有力工具。matplotlib:一个Python的2D画图库。Pmw(Pythonmegawidgets)：它是超级GUI组件集...
Python经常使用第三方工具、库、骨架
2015-09-10 20:45

weixin_30849591的博客该库能进行图形格式的转换、打印和显示。还能进行一些图形效果的处理，如图形的放大、缩小和旋转等。是用户进行图象处理的强有力工具。 http://www.pythonware.com/products/pil/ matplotlib:一个Python...
python开源编译器,python开发编译器
2021-04-27 05:11

刘云宾的博客引言最近刚刚用python写完了一个解析...ply使用简介如果你不是从事编译器或者解析器的开发工作，你可能从未听说过ply。ply是基于python的lex和yacc，而它的作者就是大名鼎鼎PythonCookbook, 3rdEdition的作者...
pythonxx
2022-04-20 14:32

FREDM1982的博客 Inspired by awesome-php. 灵感来自awesome-php。 Awesome Python 很棒的Python Admin Panels 管理面板 Algorithms and Design Patterns 算法和设计模式 ASGI Servers ASGI服务器 Asynchronous Programming 异步...
python常用模块
2018-06-08 14:12

gracesyuan的博客 PLY (Python Lex-Yacc) http://www.dabeaz.com/ply/ 正在用它写汇编器. wxPython - GUI Framework omniORBpy - CORBA Library ZODB3 - an OODB implement ReportLab - PDF generating solution numpy+scipy+...
python设计程序下载安装_python开发_常用的python模块及安装方法
2020-11-30 04:37

weixin_39863017的博客 http://www.winpcap.org/install/default.htm python-memcached client module for memcached Kodos，python的正则表达式调试工具， PLY (Python Lex-Yacc) http://www.dabeaz.com/ply/ 正在用它写汇编器....
python
2019-09-25 17:47

aixiao0036的博客 PLY (Python Lex-Yacc) http://www.dabeaz.com/ply/ 正在用它写汇编器. wxPython - GUI Framework omniORBpy - CORBA Library ZODB3 - an OODB implement ReportLab - PDF generating solution numpy+scipy+...
没有解决我的问题, 去提问

悬赏问题

¥15 素材场景中光线烘焙后灯光失效
¥15 请教一下各位，为什么我这个没有实现模拟点击
¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来
¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 ubuntu子系统密码忘记
¥15 保护模式-系统加载-段寄存器

码龄粉丝数原力等级 --

使用PLY.yacc进行适当的解析策略

0条回答默认最新

悬赏问题

使用PLY.yacc进行适当的解析策略

0条回答 默认 最新

悬赏问题

0条回答默认最新