duandaotuo5542
2016-04-19 17:16
浏览 713
已采纳

Golang正则表达式替换不包括带引号的字符串

I'm trying to implement the removeComments function in Golang from this Javascript implementation. I'm hoping to remove any comments from the text. For example:

/* this is comments, and should be removed */

However, "/* this is quoted, so it should not be removed*/"

In the Javascript implementation, quoted matching are not captured in groups, so I can easily filter them out. However, in Golang, it seems it's not easy to tell whether the matched part is captured in a group or not. So how can I implement the same removeComments logic in Golang as the same in the Javascript version?

图片转代码服务由CSDN问答提供 功能建议

我正在尝试通过removeComments 函数 https://github.com/elgs/JSONx-js/blob/master/jsonx.js“>此Javascript实现。 我希望从文本中删除任何评论。 例如:

  / *这是注释,应将其删除* / 
 
但是,“ / *这是引号,因此不应将其删除* /”  
   
 
 

在Javascript实现中,引号匹配不会分组捕获,因此我可以轻松地将它们过滤掉。 但是,在Golang中,分辨匹配的部分是否被捕获似乎并不容易。 那么如何在Golang中实现与Javascript版本中相同的 removeComments 逻辑?

  • 写回答
  • 好问题 提建议
  • 追加酬金
  • 关注问题
  • 收藏
  • 邀请回答

6条回答 默认 最新

  • dongmu5106 2016-04-22 22:35
    已采纳

    These do not preserve formatting


    Preferred way (produces a NULL if group 1 is not matched)
    works in golang playground -

         # https://play.golang.org/p/yKtPk5QCQV
         # fmt.Println(reg.ReplaceAllString(txt, "$1"))
         # (?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//[^
    ]*(?:
    |$))|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|[\S\s][^/"'\\]*)
    
         (?:                              # Comments 
              /\*                              # Start /* .. */ comment
              [^*]* \*+
              (?: [^/*] [^*]* \*+ )*
              /                                # End /* .. */ comment
           |  
              //  [^
    ]*                       # Start // comment
              (?: 
     | $ )                     # End // comment
         )
      |  
         (                                # (1 start), Non - comments 
              "
              [^"\\]*                          # Double quoted text
              (?: \\ [\S\s] [^"\\]* )*
              "
           |  
              '
              [^'\\]*                          # Single quoted text
              (?: \\ [\S\s] [^'\\]* )*
              ' 
           |  [\S\s]                           # Any other char
              [^/"'\\]*                        # Chars which doesn't start a comment, string, escape, or line continuation (escape + newline)
         )                                # (1 end)
    

    Alternative way (group 1 is always matched, but could be empty)
    works in golang playground -

     # https://play.golang.org/p/7FDGZSmMtP
     # fmt.Println(reg.ReplaceAllString(txt, "$1"))
     # (?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//[^
    ]*(?:
    |$))?((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|[\S\s][^/"'\\]*)?)     
    
     (?:                              # Comments 
          /\*                              # Start /* .. */ comment
          [^*]* \*+
          (?: [^/*] [^*]* \*+ )*
          /                                # End /* .. */ comment
       |  
          //  [^
    ]*                       # Start // comment
          (?: 
     | $ )                     # End // comment
     )?
     (                                # (1 start), Non - comments 
          (?:
               "
               [^"\\]*                          # Double quoted text
               (?: \\ [\S\s] [^"\\]* )*
               "
            |  
               '
               [^'\\]*                          # Single quoted text
               (?: \\ [\S\s] [^'\\]* )*
               ' 
            |  [\S\s]                           # Any other char
               [^/"'\\]*                        # Chars which doesn't start a comment, string, escape, or line continuation (escape + newline)
          )?
     )                                # (1 end)
    

    The Cadilac - Preserves Formatting

    (Unfortunately, this can't be done in Golang because Golang cannot do Assertions)
    Posted incase you move to a different regex engine.

         # raw:   ((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*?
    (?=[ \t]*(?:?
    |/\*|//)))?|//(?:[^\\]|\\(?:?
    )?)*?(?:?
    (?=[ \t]*(?:?
    |/\*|//))|(?=?
    ))))+)|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:?
    |[\S\s])[^/"'\\\s]*)
         # delimited:  /((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*?
    (?=[ \t]*(?:?
    |\/\*|\/\/)))?|\/\/(?:[^\\]|\\(?:?
    )?)*?(?:?
    (?=[ \t]*(?:?
    |\/\*|\/\/))|(?=?
    ))))+)|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:?
    |[\S\s])[^\/"'\\\s]*)/
    
         (                                # (1 start), Comments 
              (?:
                   (?: ^ [ \t]* )?                  # <- To preserve formatting
                   (?:
                        /\*                              # Start /* .. */ comment
                        [^*]* \*+
                        (?: [^/*] [^*]* \*+ )*
                        /                                # End /* .. */ comment
                        (?:                              # <- To preserve formatting 
                             [ \t]* ? 
                                          
                             (?=
                                  [ \t]*                  
                                  (?: ? 
     | /\* | // )
                             )
                        )?
                     |  
                        //                               # Start // comment
                        (?:                              # Possible line-continuation
                             [^\\] 
                          |  \\ 
                             (?: ? 
     )?
                        )*?
                        (?:                              # End // comment
                             ? 
                                   
                             (?=                              # <- To preserve formatting
                                  [ \t]*                          
                                  (?: ? 
     | /\* | // )
                             )
                          |  (?= ? 
     )
                        )
                   )
              )+                               # Grab multiple comment blocks if need be
         )                                # (1 end)
    
      |                                 ## OR
    
         (                                # (2 start), Non - comments 
              "
              [^"\\]*                          # Double quoted text
              (?: \\ [\S\s] [^"\\]* )*
              "
           |  
              '
              [^'\\]*                          # Single quoted text
              (?: \\ [\S\s] [^'\\]* )*
              ' 
           |  
              (?: ? 
     | [\S\s] )            # Linebreak or Any other char
              [^/"'\\\s]*                      # Chars which doesn't start a comment, string, escape,
                                               # or line continuation (escape + newline)
         )                                # (2 end)
    
    评论
    解决 无用
    打赏 举报
查看更多回答(5条)

相关推荐 更多相似问题