duan35557593 2016-10-22 00:37
浏览 48

分流场算法扩展和实现问题

Let me introduce you to my current project (that obviously yields the problem I face hence I post here). I am writing a so-called "compiler" for a simplistic language. I have already built a VM to run the produced bytecode, the associated Lexer (all this project is an optional assignment). My problem lies in the Expression parsing bit. I use the Shunting Yard Algorithm by Dijkstra to convert my postfix expressions into the corresponding AST structure, and I am incapable of adjusting the algorithm correctly to generate the correct AST if -and only if- I try to implement function calls and array subscript.

Here a sample of the algorithm implementation (everything is pretty self-explanatory I believe)

func (p *parser) shuntingyard(input token.TQueue) *ast.Node {
    var operands ast.NStack
    var operators *token.TStack

    operands = make(ast.NStack, 0)
    operators = token.TokenStack()

    for tok := input.Dequeue(); tok.Sym != "EOF"; tok = input.Dequeue() {
        switch tok.Kind {
        case "LParen":
            operators.Push(tok)
        case "RParen":
            for {
                // pop item ("(" or operator) from stack
                if operators.Empty() {
                    p.errorf("Unmatched parenthesis on line %d, expected '(' to match closing parenthesis in expression", p.lno)
                }

                op := operators.Pop()
                if op.Sym == "(" {
                    break // discard "("
                }

                if isUnary(op.Sym) {
                    node := ast.MakeNode(*op)
                    node.AddChild(operands.Pop())
                    operands.Push(node)
                    break
                }

                RHS := operands.Pop()
                LHS := operands.Pop()
                operands.Push(ast.MakeParentNode(*op, RHS, LHS))
            }
        default:
            if o1, isOp := prOps[tok.Sym]; isOp {
                // token is an operator
                for !operators.Empty() {
                    // consider top item on stack
                    op := operators.PeekTop()
                    if o2, isOp := prOps[op.Sym]; !isOp || o1.prec > o2.prec || o1.prec == o2.prec && o1.rAssoc {
                        break
                    }

                    // top item is an operator that needs to come off
                    op = operators.Pop()
                    if isUnary(op.Sym) {
                        node := ast.MakeNode(*op)
                        node.AddChild(operands.Pop())
                        operands.Push(node)
                        break
                    }
                    RHS := operands.Pop()
                    LHS := operands.Pop()
                    operands.Push(ast.MakeParentNode(*op, RHS, LHS))
                }
                // push operator (the new one) to stack
                operators.Push(tok)
            } else {
                operands.Push(ast.MakeNode(*tok))
            }
        }
    }

    // drain stack to result
    for !operators.Empty() {
        if operators.PeekTop().Sym == "(" {
            p.errorf("Unmatched parenthesis on line %d, expected ')' to match previous parenthesis in expression", p.lno)
        }

        RHS := operands.Pop()
        LHS := operands.Pop()
        operands.Push(ast.MakeParentNode(*operators.Pop(), RHS, LHS))
    }

    result := operands.Pop()
    for !operands.Empty() {
        result.AddSibling(operands.Pop())
    }

    return result
}

The idea is pretty straightforward when you encounter a binary operator from the operator stack, you pop off of the operand stack two nodes that you set as children of the operator encountered (or one if the operator is unary). And you push the resulting node on the operands stack.

For instance, this input:

c = 0;
a = 1 << 3 + 2;

results in the (valid) AST:

┣━ Assign           =
┃   ┣━ Identifier       'a'
┃   ┗━ Lshift           <<
┃       ┣━ Number           1
┃       ┗━ Plus             +
┃           ┣━ Number           3
┃           ┗━ Number           2
┗━ Assign           =
    ┣━ Identifier       'c'
    ┗━ Number           0

However, the output is being wrong whenever I try to nest function calls:

 foo(bar(0))

the result is (obviously) not correct:

┣━ Number           0
┣━ Function         bar
┗━ Function         foo

when I should have had:

┗━ Function         foo
    ┗━ Function         bar
        ┗━ Number        0

My first question is: What modification do I have to bring to my implementation to support AST generation for function calls ? Since all I have found on the Internet regarding the SY Algorithm is always generating an RPN string ...

Another thing is that an input such as:

-i++;

generates the valid output:

┗━ UnaryMinus       -u
    ┗━ PlusPlus         ++
        ┗━ Identifier       'i'

but (-i++); reports and unbalanced parenthesised expression ?

The map used to work with operators is:

var prOps = map[string]struct {
    prec   int
    rAssoc bool
}{
    "++" : {50, false}, "--" : {50, false},

    "."  : {40, false}, "["  : {40, false},

    "!"  : {30, true}, "~"   : {30, true},
    "-u" : {29, true}, "--u" : {29, true}, "++u": {29, true},
    "**" : {28, true},

    "*"  : {27, false}, "/"  : {27, false}, "%" : {27, false},
    "+"  : {26, false}, "-"  : {26, false},
    ">>" : {25, false}, "<<" : {25, false},
    ">"  : {24, false}, ">=" : {24, false}, "<" : {24, false}, "<=" : {24, false},
    "==" : {23, false}, "!=" : {23, false},
    "&"  : {22, false},
    "^"  : {21, false},
    "|"  : {20, false},
    "&&" : {19, false},
    "||" : {18, false},

    "="  : {10, true}, "+="  : {10, true}, "-=" : {10, true}, "*="  : {10, true},
    "/=" : {10, true}, "**=" : {10, true}, "^=" : {10, true}, "~="  : {10, true},
    "|=" : {10, true}, "&="  : {10, true}, "%=" : {10, true}, "<<=" : {10, true},
    ">>=": {10, true},
    ","  : {9, false},
}

Where -u, ++u, --u are the instances of the meant operator, but unarily.

On the other note, is the Shunting Yard Algorithm what I truly need ? I mean, if parsing an object-oriented expression with this algorithm is feasable, is that my best choice ?

I also read (a lot) on parsers in a larger sense. I believe, from what I read, that writing a Recursive Descent parser is my best chance to achieve my goal within reasonable amount of time. I read some source codes, but parsers tend not to be reader-friendly. What is a guideline I should keep for this project regarding the parser and AST generation ?

Please tell me if I did anything wrong with this thread :) And do ask if I missed to link/give any ressources needed !

In the hope of having some helpful answers.

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥20 测距传感器数据手册i2c
    • ¥15 RPA正常跑,cmd输入cookies跑不出来
    • ¥15 求帮我调试一下freefem代码
    • ¥15 matlab代码解决,怎么运行
    • ¥15 R语言Rstudio突然无法启动
    • ¥15 关于#matlab#的问题:提取2个图像的变量作为另外一个图像像元的移动量,计算新的位置创建新的图像并提取第二个图像的变量到新的图像
    • ¥15 改算法,照着压缩包里边,参考其他代码封装的格式 写到main函数里
    • ¥15 用windows做服务的同志有吗
    • ¥60 求一个简单的网页(标签-安全|关键词-上传)
    • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法