duandaotuo5542 2016-04-19 17:16
浏览 835
已采纳

Golang正则表达式替换不包括带引号的字符串

I'm trying to implement the removeComments function in Golang from this Javascript implementation. I'm hoping to remove any comments from the text. For example:

/* this is comments, and should be removed */

However, "/* this is quoted, so it should not be removed*/"

In the Javascript implementation, quoted matching are not captured in groups, so I can easily filter them out. However, in Golang, it seems it's not easy to tell whether the matched part is captured in a group or not. So how can I implement the same removeComments logic in Golang as the same in the Javascript version?

  • 写回答

6条回答 默认 最新

  • dongmu5106 2016-04-22 22:35
    关注

    These do not preserve formatting


    Preferred way (produces a NULL if group 1 is not matched)
    works in golang playground -

         # https://play.golang.org/p/yKtPk5QCQV
         # fmt.Println(reg.ReplaceAllString(txt, "$1"))
         # (?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//[^
    ]*(?:
    |$))|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|[\S\s][^/"'\\]*)
    
         (?:                              # Comments 
              /\*                              # Start /* .. */ comment
              [^*]* \*+
              (?: [^/*] [^*]* \*+ )*
              /                                # End /* .. */ comment
           |  
              //  [^
    ]*                       # Start // comment
              (?: 
     | $ )                     # End // comment
         )
      |  
         (                                # (1 start), Non - comments 
              "
              [^"\\]*                          # Double quoted text
              (?: \\ [\S\s] [^"\\]* )*
              "
           |  
              '
              [^'\\]*                          # Single quoted text
              (?: \\ [\S\s] [^'\\]* )*
              ' 
           |  [\S\s]                           # Any other char
              [^/"'\\]*                        # Chars which doesn't start a comment, string, escape, or line continuation (escape + newline)
         )                                # (1 end)
    

    Alternative way (group 1 is always matched, but could be empty)
    works in golang playground -

     # https://play.golang.org/p/7FDGZSmMtP
     # fmt.Println(reg.ReplaceAllString(txt, "$1"))
     # (?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//[^
    ]*(?:
    |$))?((?:"[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|[\S\s][^/"'\\]*)?)     
    
     (?:                              # Comments 
          /\*                              # Start /* .. */ comment
          [^*]* \*+
          (?: [^/*] [^*]* \*+ )*
          /                                # End /* .. */ comment
       |  
          //  [^
    ]*                       # Start // comment
          (?: 
     | $ )                     # End // comment
     )?
     (                                # (1 start), Non - comments 
          (?:
               "
               [^"\\]*                          # Double quoted text
               (?: \\ [\S\s] [^"\\]* )*
               "
            |  
               '
               [^'\\]*                          # Single quoted text
               (?: \\ [\S\s] [^'\\]* )*
               ' 
            |  [\S\s]                           # Any other char
               [^/"'\\]*                        # Chars which doesn't start a comment, string, escape, or line continuation (escape + newline)
          )?
     )                                # (1 end)
    

    The Cadilac - Preserves Formatting

    (Unfortunately, this can't be done in Golang because Golang cannot do Assertions)
    Posted incase you move to a different regex engine.

         # raw:   ((?:(?:^[ \t]*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:[ \t]*?
    (?=[ \t]*(?:?
    |/\*|//)))?|//(?:[^\\]|\\(?:?
    )?)*?(?:?
    (?=[ \t]*(?:?
    |/\*|//))|(?=?
    ))))+)|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:?
    |[\S\s])[^/"'\\\s]*)
         # delimited:  /((?:(?:^[ \t]*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:[ \t]*?
    (?=[ \t]*(?:?
    |\/\*|\/\/)))?|\/\/(?:[^\\]|\\(?:?
    )?)*?(?:?
    (?=[ \t]*(?:?
    |\/\*|\/\/))|(?=?
    ))))+)|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|(?:?
    |[\S\s])[^\/"'\\\s]*)/
    
         (                                # (1 start), Comments 
              (?:
                   (?: ^ [ \t]* )?                  # <- To preserve formatting
                   (?:
                        /\*                              # Start /* .. */ comment
                        [^*]* \*+
                        (?: [^/*] [^*]* \*+ )*
                        /                                # End /* .. */ comment
                        (?:                              # <- To preserve formatting 
                             [ \t]* ? 
                                          
                             (?=
                                  [ \t]*                  
                                  (?: ? 
     | /\* | // )
                             )
                        )?
                     |  
                        //                               # Start // comment
                        (?:                              # Possible line-continuation
                             [^\\] 
                          |  \\ 
                             (?: ? 
     )?
                        )*?
                        (?:                              # End // comment
                             ? 
                                   
                             (?=                              # <- To preserve formatting
                                  [ \t]*                          
                                  (?: ? 
     | /\* | // )
                             )
                          |  (?= ? 
     )
                        )
                   )
              )+                               # Grab multiple comment blocks if need be
         )                                # (1 end)
    
      |                                 ## OR
    
         (                                # (2 start), Non - comments 
              "
              [^"\\]*                          # Double quoted text
              (?: \\ [\S\s] [^"\\]* )*
              "
           |  
              '
              [^'\\]*                          # Single quoted text
              (?: \\ [\S\s] [^'\\]* )*
              ' 
           |  
              (?: ? 
     | [\S\s] )            # Linebreak or Any other char
              [^/"'\\\s]*                      # Chars which doesn't start a comment, string, escape,
                                               # or line continuation (escape + newline)
         )                                # (2 end)
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(5条)

报告相同问题?

悬赏问题

  • ¥15 高德地图点聚合中Marker的位置无法实时更新
  • ¥15 DIFY API Endpoint 问题。
  • ¥20 sub地址DHCP问题
  • ¥15 delta降尺度计算的一些细节,有偿
  • ¥15 Arduino红外遥控代码有问题
  • ¥15 数值计算离散正交多项式
  • ¥30 数值计算均差系数编程
  • ¥15 redis-full-check比较 两个集群的数据出错
  • ¥15 Matlab编程问题
  • ¥15 训练的多模态特征融合模型准确度很低怎么办