doupeng8494
2017-09-05 15:16
浏览 272
已采纳

在Golang正则表达式中获取子组的命名列表

I'm looking for a function that returns a map[string]interface{} where interface{} can be a slice, a a map[string]interface{} or a value.

My use case is to parse WKT geometry like the following and retrieves point values; Example for a donut polygon:

POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))

The regex (I voluntary set \d that matches only integers for readability purpose):

(POLYGON \(
    (?P<polygons>\(
        (?P<points>(?P<point>(\d \d), ){3,})
        (?P<last_point>\d \d )\),)*
    (?P<last_polygon>\(
        (?P<points>(?P<point>(\d \d), ){3,})
        (?P<last_point>\d \d)\))\)
)

I have a function (copied from SO) that retrieves some informations but it's not that good for nested groups and list of groups:

func getRegexMatchParams(reg *regexp.Regexp, url string) (paramsMap map[string]string) {
    match := reg.FindStringSubmatch(url)
    paramsMap = make(map[string]string)
    for i, name := range reg.SubexpNames() {
        if i > 0 && i <= len(match) {
            paramsMap[name] = match[i]
        }
    }
    return match
}

It seems that the group point gets only 1 point. example on playground

[EDIT] The result I want is something like this:

map[string]interface{}{
    "polygons": map[string]interface{} {
        "points": []interface{}{
            {map[string]string{"point": "0 0"}},     
            {map[string]string{"point": "0 10"}},        
            {map[string]string{"point": "10 10"}},        
            {map[string]string{"point": "10 0"}},
        },
        "last_point": "0 0",
    },
    "last_polygon": map[string]interface{} {
        "points": []interface{}{
            {map[string]string{"point": "3 3"}},     
            {map[string]string{"point": "3 7"}},        
            {map[string]string{"point": "7 7"}},        
            {map[string]string{"point": "7 3"}},
        },
        "last_point": "3 3",
    }
}

So I can use it further for different purposes like querying databases and validate that last_point = points[0] for each polygon.

  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

1条回答 默认 最新

  • doukanwen4114 2017-09-05 15:37
    已采纳

    Try to add some whitespace to the regex.

    Also note that this engine won't retain all capture group values that are
    within a quantified outer grouping like (a|b|c)+ where this group will only contain the last a or b or c it finds.

    And, your regex can be reduced to this

    (POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\)(?:\s*,\s*|\s*\)))+)

    https://play.golang.org/p/rLaaEa_7GX


    The original:

    (POLYGON\s*\((?P<polygons>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\),)*(?P<last_polygon>\(\s*(?P<points>(?P<point>\s*(\d+\s+\d+)\s*,){3,})\s*(?P<last_point>\d+\s+\d+)\s*\))\s*\))

    https://play.golang.org/p/rZgJYPDMzl

    See below for what the groups contain.

     (                             # (1 start)
          POLYGON \s* \(
          (?P<polygons>                 # (2 start)
               \( \s* 
               (?P<points>                   # (3 start)
                    (?P<point>                    # (4 start)
                         \s* 
                         ( \d+ \s+ \d+ )               # (5)
                         \s* 
                         , 
                    ){3,}                         # (4 end)
               )                             # (3 end)
               \s*            
               (?P<last_point> \d+ \s+ \d+ )  # (6)
               \s* \),
          )*                            # (2 end)
          (?P<last_polygon>             # (7 start)
               \( \s* 
               (?P<points>                   # (8 start)
                    (?P<point>                    # (9 start)
                         \s* 
                         ( \d+ \s+ \d+ )               # (10)
                         \s* 
                         , 
                    ){3,}                         # (9 end)
               )                             # (8 end)
               \s* 
               (?P<last_point> \d+ \s+ \d+ )  # (11)
               \s* \)
          )                             # (7 end)
          \s* \)
     )                             # (1 end)
    

    Input

    POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))
    

    Output

     **  Grp 0                -  ( pos 0 , len 65 ) 
    POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))  
     **  Grp 1                -  ( pos 0 , len 65 ) 
    POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))  
     **  Grp 2 [polygons]     -  ( pos 9 , len 30 ) 
    (0 0, 0 10, 10 10, 10 0, 0 0),  
     **  Grp 3 [points]       -  ( pos 10 , len 23 ) 
    0 0, 0 10, 10 10, 10 0,  
     **  Grp 4 [point]        -  ( pos 27 , len 6 ) 
     10 0,  
     **  Grp 5                -  ( pos 28 , len 4 ) 
    10 0  
     **  Grp 6 [last_point]   -  ( pos 34 , len 3 ) 
    0 0  
     **  Grp 7 [last_polygon] -  ( pos 39 , len 25 ) 
    (3 3, 3 7, 7 7, 7 3, 3 3)  
     **  Grp 8 [points]       -  ( pos 40 , len 19 ) 
    3 3, 3 7, 7 7, 7 3,  
     **  Grp 9 [point]        -  ( pos 54 , len 5 ) 
     7 3,  
     **  Grp 10                -  ( pos 55 , len 3 ) 
    7 3  
     **  Grp 11 [last_point]   -  ( pos 60 , len 3 ) 
    3 3  
    

    Possible Solution

    It's not impossible. It just takes a few extra steps.
    (As an aside, isn't there a library for WKT that can parse this for you ?)

    Now, I don't know your language capabilities, so this is just a general approach.

    1. Validate the form you're parsing.
    This will validate and return all polygon sets as a single string in All_Polygons group.

    Target POLYGON ((0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3))

    POLYGON\s*\((?P<All_Polygons>(?:\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))(?:\s*,\(\s*\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,}\s*\))*)\s*\)

     **  Grp 1 [All_Polygons] -  ( pos 9 , len 55 ) 
    (0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)
    

    2. If 1 was successful, set up a loop match using the output of All_Polygons string.

    Target (0 0, 0 10, 10 10, 10 0, 0 0),(3 3, 3 7, 7 7, 7 3, 3 3)

    (?:\(\s*(?P<Single_Poly_All_Pts>\d+\s+\d+(?:\s*,\s*\d+\s+\d+){2,})\s*\))

    This step is equivalent of a find all type of match. It should match successive values of all the points of a single polygon, returned in Single_Poly_All_Pts group string.

    This will give you these 2 separate matches, which can be put into a temp array having 2 value strings:

     **  Grp 1 [Single_Poly_All_Pts] -  ( pos 1 , len 27 ) 
    0 0, 0 10, 10 10, 10 0, 0 0  
    
     **  Grp 1 [Single_Poly_All_Pts] -  ( pos 31 , len 23 ) 
    3 3, 3 7, 7 7, 7 3, 3 3  
    

    3. If 2 was successful, set up a loop match using the temp array output of step 2.
    This will give you the individual points of each polygon.

    (?P<Single_Point>\d+\s+\d+)

    Again this is a loop match (or a find all type of match). For each array element
    (Polygon), this will produce the individual points.

    Target[element 1] 0 0, 0 10, 10 10, 10 0, 0 0

     **  Grp 1 [Single_Point] -  ( pos 0 , len 3 ) 
    0 0  
     **  Grp 1 [Single_Point] -  ( pos 5 , len 4 ) 
    0 10  
     **  Grp 1 [Single_Point] -  ( pos 11 , len 5 ) 
    10 10  
     **  Grp 1 [Single_Point] -  ( pos 18 , len 4 ) 
    10 0  
     **  Grp 1 [Single_Point] -  ( pos 24 , len 3 ) 
    0 0  
    

    And,

    Target[element 2] 3 3, 3 7, 7 7, 7 3, 3 3

     **  Grp 1 [Single_Point] -  ( pos 0 , len 3 ) 
    3 3  
     **  Grp 1 [Single_Point] -  ( pos 5 , len 3 ) 
    3 7  
     **  Grp 1 [Single_Point] -  ( pos 10 , len 3 ) 
    7 7  
     **  Grp 1 [Single_Point] -  ( pos 15 , len 3 ) 
    7 3  
     **  Grp 1 [Single_Point] -  ( pos 20 , len 3 ) 
    3 3  
    
    已采纳该答案
    打赏 评论

相关推荐 更多相似问题