dongpa9277
dongpa9277
2019-01-10 17:11
浏览 100
已采纳

将GoLR作为目标匹配ANTLR4中任何可打印的类似字母的字符

This is freaking me out, I just can't find a solution to it. I have a grammar for search queries and would like to match any searchterm in a query composed out of printable letters except for special characters "(", ")". Strings enclosed in quotes are handled separately and work.

Here is a somewhat working grammar:

    /* ANTLR Grammar for Minidb Query Language */

grammar Mdb;

start
    : searchclause EOF
    ;

searchclause
    : table expr
    ;

expr
    : fieldsearch
    | searchop fieldsearch
    | unop expr
    | expr relop expr
    | lparen expr relop expr rparen
    ;

lparen
    : '('
    ;

rparen
    : ')'
    ;

unop
    : NOT
    ;

relop
    : AND
    | OR
    ;

searchop
    : NO
    | EVERY
    ;

fieldsearch
    : field EQ searchterm
    ;

field
    : ID
    ;

table
    : ID
    ;

searchterm
    : 
    | STRING
    | ID+
    | DIGIT+
    | DIGIT+ ID+ 
    ;

STRING
    : '"' ~('
'|'"')* ('"' )
    ;

AND
    : 'and'
    ;

OR
    : 'or'
    ;

NOT
    : 'not'
    ;
NO
    : 'no'
    ;

EVERY
    : 'every'
    ;

EQ
    : '='
    ;

fragment VALID_ID_START
    : ('a' .. 'z') | ('A' .. 'Z') | '_'
    ;

fragment VALID_ID_CHAR
    : VALID_ID_START | ('0' .. '9')
    ;

ID
    : VALID_ID_START VALID_ID_CHAR*
    ;

DIGIT
    : ('0' .. '9')
    ;

/*
NOT_SPECIAL
    : ~(' ' | '\t' | '
' | '' | '\'' | '"' | ';' | '.' | '=' | '(' | ')' )
    ; */

WS
   : [ 
\t] + -> skip
;

The problem is that searchterm is too restricted. It should match any character that is in the commented out NOT_SPECIAL, i.e., valid queries would be:

Person Name=%
Person Address=^%Street%%%$^&*@^

But whenever I try to put NOT_SPECIAL in any way into the definition of searchterm it doesn't work. I have tried putting it literally into the rule, too (commenting out NOT_SPECIAL) and many others things, but it just doesn't work. In most of my attempts the grammar just complained about extraneous input after "=" and said it was expecting EOF. But I also cannot put EOF into NOT_SPECIAL.

Is there any way I can simply parse every text after "=" in rule fieldsearch until there is a whitespace or ")", "("?

N.B. The STRING rule works fine, but the user ought not be required to use quotes every time, because this is a command line tool and they'd need to be escaped.

Target language is Go.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • dql123000
    dql123000 2019-01-11 09:24
    已采纳

    You could solve that by introducing a lexical mode that you'll enter whenever you match an EQ token. Once in that lexical mode, you either match a (, ) or a whitespace (in which case you pop out of the lexical mode), or you keep matching your NOT_SPECIAL chars.

    By using lexical modes, you must define your lexer- and parser rules in their own files. Be sure to use lexer grammar ... and parser grammar ... instead of the grammar ... you use in a combined .g4 file.

    A quick demo:

    lexer grammar MdbLexer;
    
    STRING
     : '"' ~[
    "]* '"'
     ;
    
    OPAR
     : '('
     ;
    
    CPAR
     : ')'
     ;
    
    AND
     : 'and'
     ;
    
    OR
     : 'or'
     ;
    
    NOT
     : 'not'
     ;
    
    NO
     : 'no'
     ;
    
    EVERY
     : 'every'
     ;
    
    EQ
     : '=' -> pushMode(NOT_SPECIAL_MODE)
     ;
    
    ID
     : VALID_ID_START VALID_ID_CHAR*
     ;
    
    DIGIT
     : [0-9]
     ;
    
    WS
     : [ 
    \t]+ -> skip
     ;
    
    fragment VALID_ID_START
     : [a-zA-Z_]
     ;
    
    fragment VALID_ID_CHAR
     : [a-zA-Z_0-9]
     ;
    
    mode NOT_SPECIAL_MODE;
    
      OPAR2
       : '(' -> type(OPAR), popMode
       ;
    
      CPAR2
       : ')' -> type(CPAR), popMode
       ;
    
      WS2
       : [ \t
    ] -> skip, popMode
       ;
    
      NOT_SPECIAL
       : ~[ \t
    ()]+
       ;
    

    Your parser grammar would start like this:

    parser grammar MdbParser;
    
    options {
        tokenVocab=MdbLexer;
    }
    
    start
     : searchclause EOF
     ;
    
    // your other parser rules
    

    My Go is a bit rusty, but a small Java test:

    String source = "Person Address=^%Street%%%$^&*@^()";
    
    MdbLexer lexer = new MdbLexer(CharStreams.fromString(source));
    
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    tokens.fill();
    
    for (Token t : tokens.getTokens()) {
      System.out.printf("%-15s %s
    ", MdbLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
    }
    

    print the following:

    ID              Person
    ID              Address
    EQ              =
    NOT_SPECIAL     ^%Street%%%$^&*@^
    OPAR            (
    CPAR            )
    EOF             <EOF>
    
    点赞 评论

相关推荐