dongpa9277 2019-01-10 17:11
浏览 104
已采纳

将GoLR作为目标匹配ANTLR4中任何可打印的类似字母的字符

This is freaking me out, I just can't find a solution to it. I have a grammar for search queries and would like to match any searchterm in a query composed out of printable letters except for special characters "(", ")". Strings enclosed in quotes are handled separately and work.

Here is a somewhat working grammar:

    /* ANTLR Grammar for Minidb Query Language */

grammar Mdb;

start
    : searchclause EOF
    ;

searchclause
    : table expr
    ;

expr
    : fieldsearch
    | searchop fieldsearch
    | unop expr
    | expr relop expr
    | lparen expr relop expr rparen
    ;

lparen
    : '('
    ;

rparen
    : ')'
    ;

unop
    : NOT
    ;

relop
    : AND
    | OR
    ;

searchop
    : NO
    | EVERY
    ;

fieldsearch
    : field EQ searchterm
    ;

field
    : ID
    ;

table
    : ID
    ;

searchterm
    : 
    | STRING
    | ID+
    | DIGIT+
    | DIGIT+ ID+ 
    ;

STRING
    : '"' ~('
'|'"')* ('"' )
    ;

AND
    : 'and'
    ;

OR
    : 'or'
    ;

NOT
    : 'not'
    ;
NO
    : 'no'
    ;

EVERY
    : 'every'
    ;

EQ
    : '='
    ;

fragment VALID_ID_START
    : ('a' .. 'z') | ('A' .. 'Z') | '_'
    ;

fragment VALID_ID_CHAR
    : VALID_ID_START | ('0' .. '9')
    ;

ID
    : VALID_ID_START VALID_ID_CHAR*
    ;

DIGIT
    : ('0' .. '9')
    ;

/*
NOT_SPECIAL
    : ~(' ' | '\t' | '
' | '' | '\'' | '"' | ';' | '.' | '=' | '(' | ')' )
    ; */

WS
   : [ 
\t] + -> skip
;

The problem is that searchterm is too restricted. It should match any character that is in the commented out NOT_SPECIAL, i.e., valid queries would be:

Person Name=%
Person Address=^%Street%%%$^&*@^

But whenever I try to put NOT_SPECIAL in any way into the definition of searchterm it doesn't work. I have tried putting it literally into the rule, too (commenting out NOT_SPECIAL) and many others things, but it just doesn't work. In most of my attempts the grammar just complained about extraneous input after "=" and said it was expecting EOF. But I also cannot put EOF into NOT_SPECIAL.

Is there any way I can simply parse every text after "=" in rule fieldsearch until there is a whitespace or ")", "("?

N.B. The STRING rule works fine, but the user ought not be required to use quotes every time, because this is a command line tool and they'd need to be escaped.

Target language is Go.

  • 写回答

1条回答 默认 最新

  • dql123000 2019-01-11 09:24
    关注

    You could solve that by introducing a lexical mode that you'll enter whenever you match an EQ token. Once in that lexical mode, you either match a (, ) or a whitespace (in which case you pop out of the lexical mode), or you keep matching your NOT_SPECIAL chars.

    By using lexical modes, you must define your lexer- and parser rules in their own files. Be sure to use lexer grammar ... and parser grammar ... instead of the grammar ... you use in a combined .g4 file.

    A quick demo:

    lexer grammar MdbLexer;
    
    STRING
     : '"' ~[
    "]* '"'
     ;
    
    OPAR
     : '('
     ;
    
    CPAR
     : ')'
     ;
    
    AND
     : 'and'
     ;
    
    OR
     : 'or'
     ;
    
    NOT
     : 'not'
     ;
    
    NO
     : 'no'
     ;
    
    EVERY
     : 'every'
     ;
    
    EQ
     : '=' -> pushMode(NOT_SPECIAL_MODE)
     ;
    
    ID
     : VALID_ID_START VALID_ID_CHAR*
     ;
    
    DIGIT
     : [0-9]
     ;
    
    WS
     : [ 
    \t]+ -> skip
     ;
    
    fragment VALID_ID_START
     : [a-zA-Z_]
     ;
    
    fragment VALID_ID_CHAR
     : [a-zA-Z_0-9]
     ;
    
    mode NOT_SPECIAL_MODE;
    
      OPAR2
       : '(' -> type(OPAR), popMode
       ;
    
      CPAR2
       : ')' -> type(CPAR), popMode
       ;
    
      WS2
       : [ \t
    ] -> skip, popMode
       ;
    
      NOT_SPECIAL
       : ~[ \t
    ()]+
       ;
    

    Your parser grammar would start like this:

    parser grammar MdbParser;
    
    options {
        tokenVocab=MdbLexer;
    }
    
    start
     : searchclause EOF
     ;
    
    // your other parser rules
    

    My Go is a bit rusty, but a small Java test:

    String source = "Person Address=^%Street%%%$^&*@^()";
    
    MdbLexer lexer = new MdbLexer(CharStreams.fromString(source));
    
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    tokens.fill();
    
    for (Token t : tokens.getTokens()) {
      System.out.printf("%-15s %s
    ", MdbLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
    }
    

    print the following:

    ID              Person
    ID              Address
    EQ              =
    NOT_SPECIAL     ^%Street%%%$^&*@^
    OPAR            (
    CPAR            )
    EOF             <EOF>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 python使用pulp线性优化时报错
  • ¥15 为什么我的uibot导入py模块出错呀。py文件放在了uibot里对应的python文件夹了,卸了重安也不行
  • ¥15 开源或低价数据中台哪个最好
  • ¥15 arduino编程出现字符串疑似覆盖现象
  • ¥15 我的b站在没有碰到屏幕的情况下偶尔会自动跳出进度条,就像在屏幕上点了一下一样,但我并没有点。而且视频进度并没有变。这可能是什么原因造成的?
  • ¥30 STK matlab python仿真
  • ¥15 关于IMageEnView 图标定位问题
  • ¥20 求解答(matlab)
  • ¥30 ffmpeg库使用过程中遇到的问题
  • ¥15 pyqt5 中python如何通过Qtwebchannel主动发消息给web前端