如何实现BNF语法树来解析GO中的输入？

The grammar for the type language is as follows:

TYPE ::= TYPEVAR | PRIMITIVE_TYPE | FUNCTYPE | LISTTYPE;
PRIMITIVE_TYPE ::= ‘int’ | ‘float’ | ‘long’ | ‘string’;
TYPEVAR ::= ‘`’ VARNAME; // Note, the character is a backwards apostrophe!
VARNAME ::= [a-zA-Z][a-zA-Z0-9]*; // Initial letter, then can have numbers
FUNCTYPE ::= ‘(‘ ARGLIST ‘)’ -> TYPE | ‘(‘ ‘)’ -> TYPE;
ARGLIST ::= TYPE ‘,’ ARGLIST | TYPE;
LISTTYPE ::= ‘[‘ TYPE ‘]’;

My input like this: TYPE

for example, if I input (int,int)->float, this is valid. If I input ( [int] , int), it's a wrong type and invalid.

I need to parse input from keyboard and decide if it's valid under this grammar(for later type inference). However, I don't know how to build this grammar with go and how to parse input by each byte. Is there any hint or similar implementation? That's will be really helpful.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douhoulei4706 2014-12-04 05:50
关注
For your purposes, the grammar of types looks simple enough that you should be able to write a recursive descent parser that roughly matches the shape of your grammar.

As a concrete example, let's say that we're recognizing a similar language.

TYPE ::= PRIMITIVETYPE | TUPLETYPE PRIMITIVETYPE ::= 'int' TUPLETYPE ::= '(' ARGLIST ')' ARGLIST ::= TYPE ARGLIST | TYPE

Not quite exactly the same as your original problem, but you should be able to see the similarities.

A recursive descent parser consists of functions for each production rule.

func ParseType(???) error { ??? } func ParsePrimitiveType(???) error { ??? } func ParseTupleType(???) error { ??? } func ParseArgList(???) error { ??? }

where we'll denote things that we don't quite know what to put as ???* till we get there. We at least will say for now that we get an error if we can't parse.

The input into each of the functions is some stream of tokens. In our case, those tokens consist of sequences of:

"int" "(" ")"

and we can imagine a Stream might be something that satisfies:

type Stream interface { Peek() string // peek at next token, stay where we are Next() string // pick next token, move forward }

to let us walk sequentially through the token stream.

A lexer is responsible for taking something like a string or io.Reader and producing this stream of string tokens. Lexers are fairly easy to write: you can imagine just using regexps or something similar to break a string into tokens.

Assuming we have a token stream, then a parser then just needs to deal with that stream and a very limited set of possibilities. As mentioned before, each production rule corresponds to a parsing function. Within a production rule, each alternative is a conditional branch. If the grammar is particularly simple (as yours is!), we can figure out which conditional branch to take.

For example, let's look at TYPE and its corresponding ParseType function:

TYPE ::= PRIMITIVETYPE | TUPLETYPE PRIMITIVETYPE ::= 'int' TUPLETYPE ::= '(' ARGLIST ')'

How might this corresponds to the definition of ParseType?

The production says that there are two possibilities: it can either be (1) primitive, or (2) tuple. We can peek at the token stream: if we see "int", then we know it's primitive. If we see a "(", then since the only possibility is that it's tuple type, we can call the tupletype parser function and let it do the dirty work.

It's important to note: if we don't see either a "(" nor an "int", then something horribly has gone wrong! We know this just from looking at the grammar. We can see that every type must parse from something FIRST starting with one of those two tokens.

Ok, let's write the code.

func ParseType(s Stream) error { peeked := s.Peek() if peeked == "int" { return ParsePrimitiveType(s) } if peeked == "(" { return ParseTupleType(s) } return fmt.Errorf("ParseType on %#v", peeked) }

Parsing PRIMITIVETYPE and TUPLETYPE is equally direct.

func ParsePrimitiveType(s Stream) error { next := s.Next() if next == "int" { return nil } return fmt.Errorf("ParsePrimitiveType on %#v", next) } func ParseTupleType(s Stream) error { lparen := s.Next() if lparen != "(" { return fmt.Errorf("ParseTupleType on %#v", lparen) } err := ParseArgList(s) if err != nil { return err } rparen := s.Next() if rparen != ")" { return fmt.Errorf("ParseTupleType on %#v", rparen) } return nil }

The only one that might cause some issues is the parser for argument lists. Let's look at the rule.

ARGLIST ::= TYPE ARGLIST | TYPE

If we try to write the function ParseArgList, we might get stuck because we don't yet know which choice to make. Do we go for the first, or the second choice?

Well, let's at least parse out the part that's common to both alternatives: the TYPE part.

func ParseArgList(s Stream) error { err := ParseType(s) if err != nil { return err } /// ... FILL ME IN. Do we call ParseArgList() again, or stop? }

So we've parsed the prefix. If it was the second case, we're done. But what if it were the first case? Then we'd still have to read additional lists of types.

Ah, but if we are continuing to read additional types, then the stream must FIRST start with another type. And we know that all types FIRST start either with "int" or "(". So we can peek at the stream. Our decision whether or not we picked the first or second choice hinges just on this!

func ParseArgList(s Stream) error { err := ParseType(s) if err != nil { return err } peeked := s.Peek() if peeked == "int" || peeked == "(" { // alternative 1 return ParseArgList(s) } // alternative 2 return nil }

Believe it or not, that's pretty much all we need. Here is working code.

package main import "fmt" type Stream interface { Peek() string Next() string } type TokenSlice []string func (s *TokenSlice) Peek() string { return (*s)[0] } func (s *TokenSlice) Next() string { result := (*s)[0] *s = (*s)[1:] return result } func ParseType(s Stream) error { peeked := s.Peek() if peeked == "int" { return ParsePrimitiveType(s) } if peeked == "(" { return ParseTupleType(s) } return fmt.Errorf("ParseType on %#v", peeked) } func ParsePrimitiveType(s Stream) error { next := s.Next() if next == "int" { return nil } return fmt.Errorf("ParsePrimitiveType on %#v", next) } func ParseTupleType(s Stream) error { lparen := s.Next() if lparen != "(" { return fmt.Errorf("ParseTupleType on %#v", lparen) } err := ParseArgList(s) if err != nil { return err } rparen := s.Next() if rparen != ")" { return fmt.Errorf("ParseTupleType on %#v", rparen) } return nil } func ParseArgList(s Stream) error { err := ParseType(s) if err != nil { return err } peeked := s.Peek() if peeked == "int" || peeked == "(" { // alternative 1 return ParseArgList(s) } // alternative 2 return nil } func main() { fmt.Println(ParseType(&TokenSlice{"int"})) fmt.Println(ParseType(&TokenSlice{"(", "int", ")"})) fmt.Println(ParseType(&TokenSlice{"(", "int", "int", ")"})) fmt.Println(ParseType(&TokenSlice{"(", "(", "int", ")", "(", "int", ")", ")"})) // Should show error: fmt.Println(ParseType(&TokenSlice{"(", ")"})) }

This is a toy parser, of course, because it is not handling certain kinds of errors very well (like premature end of input), and tokens should include, not only their textual content, but also their source location for good error reporting. For your own purposes, you'll also want to expand the parsers so that they don't just return error, but also some kind of useful result from the parse.

This answer is just a sketch on how recursive descent parsers work. But you should really read a good compiler book to get the details, because you need them. The Dragon Book, for example, spends at least a good chapter on about how to write recursive descent parsers with plenty of the technical details. in particular, you want to know about the concept of FIRST sets (which I hinted at), because you'll need to understand them to choose which alternative is appropriate when writing each of your parser functions.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

如何实现BNF语法树来解析GO中的输入？
2014-12-04 04:14

回答 1 已采纳 For your purposes, the grammar of types looks simple enough that you should be able to write a rec
如何做下列两个语言的BNF语法 c语言其他有问必答
2021-11-25 18:33

回答 1 已采纳查了下 bnf 的资料，第一题的一个大写字母后边跟0个或多个字符（0-5），ABNF 中很容易实现，BNF 貌似没有定义字符集的能力 BNF S := UPPER { D } D := '0
语法编译器中的nextch（）怎么写？ c++ 开发语言
2020-05-07 09:36

回答 1 已采纳 https://blog.csdn.net/qq_36721220/article/details/103599712
Go语言进阶（二） -- 编译原理基础、规则引擎设计与实现词法分析语法分析语法数上下文无关语法巴科斯范式
2023-03-02 16:26

异已的博客规则引擎设计与实现编译原理基础讲解：词法分析、语法分析
类型如何成为Go中的函数？
2017-11-11 19:05

回答 2 已采纳 Note the twoGetByKey methods are two separate methods on two different types, (Two different types
简洁的语法来解析诸如“ abab”或“ baba”之类的交替字符的字符串
2016-01-22 19:32

回答 1 已采纳 The BNF grammar you have is this: expr ::= A | B A ::= "a" B | "a" B ::= "b" A | "b" which I th
如何画以下两个BNF语言的铁路图 c语言有问必答
2021-11-26 18:06

回答 1 已采纳这次是用 EBNF 来描述了？如果是这样，我一会再答，我琢磨琢磨百度出来的内容 loop_statment ::= D | F | V , { space* , '+' , space* , D |
关于 SQL 解析，为何编程语言解析器 ANTLR 更胜一筹？
2022-12-30 22:33

@SmartSi的博客 ANTLR 所生成的解析器客户端将输入的文本生成抽象语法树，并提供遍历树的接口，以访问文本的各个部分。ANTLR 的实现与前文所讲述的词法分析与语法分析是一致的。词法分析器根据语法规则做词法单元的拆分；语法分析器...
根据输入的字符串判断它是否和下面的能够匹配，这个问题解决的思路是什么？怎么写出来？ erlang golang r语言
2018-12-18 13:05

回答 1 已采纳 https://blog.csdn.net/qq_41286356/article/details/88950531
这个PHP代码做了什么？我认为这是崩溃PHP服务器[关闭] php
2015-11-16 13:08

回答 1 已采纳 It's obfuscated the following code : $settings = mysql_query ('select * from settings'); $setting
base64 解码后再GZipStream解码提示无效 c# javascript python
2022-08-04 17:24

回答 1 已采纳可以看下python参考手册中的 python-binhex --- 对binhex4文件进行编码和解码
kowhai:用 Go 开发的 Earley 风格的解析器
2021-06-06 03:07

考海Kowhai 是一种基于已发布的 MARPA 算法的 Earley 式解析器。... 一个kowhai.Parser ，给定一个标记流（可能来自词法分析器）和一个状态机，将产生一个解析树，然后可以用来构建一个抽象语法树。
Little Quilt
2017-02-14 15:39

回答 2 已采纳 http://blog.csdn.net/vampirem/article/details/11485575
关于SQL解析，为何编程语言解析器ANTLR更胜一筹？
2018-10-23 14:30

ShardingSphere的博客不过，它依然是一门完善的编程语言，因此对SQL的语法进行解析，与解析其他编程语言（如：Java语言、C语言、Go语言等）并无本质区别。一、概念谈到SQL解析，就不得不谈一下文本识别。文本识别是根据给定的规则把...
gocc:解析器扫描仪生成器
2021-04-29 13:38

对于复杂的应用程序，用户通常使用抽象语法树（AST）来表示输入的派生。用户提供了一组构造AST的功能，这些功能是从BNF中指定的动作表达式中调用的。有关的示例，请参见。（gocc3用户指南将很快发布）安装首先...
Go编译原理系列2（词法分析&语法分析基础）
2021-12-31 17:36

书旅LY的博客词法分析器是如何将我们的源文件中的字符翻译成词法单元的（不确定有穷状态机&确定有穷状态机）
使用ANTLR和Go实现DSL入门
2022-05-10 08:00

Tony Bai的博客一. 引子设计与实现一门像Go这样的通用编程语言的确很难！...就像著名的语言解析器生成工具ANTLR[2]作者Terence Parr在《编程语言实现模式》[3]一书中说的那样：Yes, building a compiler for a ...
编写go语言用到的编译器_如何在Go中编写编译器：快速指南
2020-08-03 03:22

cumifi2519的博客编写go语言用到的编译器by Joseph Livni 约瑟夫·利夫尼(Joseph Livni) 如何在Go中编写编译器：快速指南 (How to write a compiler in Go: a quick guide) Compilers are awesome! ? ? ? They combine theory and ...
宾州汉语句法依存指南树库(3.0) 中文整理版
2021-09-26 13:35

说好今夜不点烟的博客由于这个Treebank的目的是提供一个工具来训练信息处理工具，如POS标记器和解析器，我们力求为我们所选文本提供坚实的语言学分析，基于当前的汉语句法研究和语言学专家参与了这个项目。然而，这是不切实际的提供高度...
Postgres数据库词法分析和语法分析源码解析
2021-09-16 17:40

丶Summer ~Z的博客 Postgres数据库词法分析和语法分析源码解析Lex和Yacc二级目录三级目录 Lex和Yacc 强调文本强调文本加粗文本加粗文本标记文本删除文本引用文本 H2O is是液体。 210 运算结果是 1024。二级目录三级目录 ...
没有解决我的问题, 去提问

悬赏问题

¥15 opencv 无法读取视频
¥15 用matlab 实现通信仿真
¥15 按键修改电子时钟，C51单片机
¥60 Java中实现如何实现张量类，并用于图像处理(不运用其他科学计算库和图像处理库）)
¥20 5037端口被adb自己占了
¥15 python：excel数据写入多个对应word文档
¥60 全一数分解素因子和素数循环节位数
¥15 ffmpeg如何安装到虚拟环境
¥188 寻找能做王者评分提取的
¥15 matlab用simulink求解一个二阶微分方程，要求截图

如何实现BNF语法树来解析GO中的输入？

1条回答 默认 最新

悬赏问题

1条回答默认最新