douzhang3898 2014-01-05 05:55
浏览 444
已采纳

golang正则表达式以提取数量对及其单位

I have a set of human readable strings expressing a duration of time. Here are four examples:

1 days 40 hrs 23 min 50 sec

3 hrs 1 min 30 sec

10 days 23 min 11 sec

52 sec

I am trying to convert these strings into number of seconds. The math to do this is quite simple once the string is broken down into its components - it's just multiplication and addition. I am having some issues however with writing the regular expression to parse the string into [<quantity>, <unit>] pairs. As an example, the output I would like for the string:

1 days 40 hrs 23 min 50 sec

is an array (or slice) like:

[[1, "days"], [40, "hrs"], [23, "min"], [50, "sec"]].

Below is the code for what I've tried so far and its output (executable at http://play.golang.org/p/iR-xfc8MVQ). segs was my first attempt, which seems to break the string down into 4 components ok but each component is just a string like 1 days rather than a 2-element array like [1, days]. segs2 was my second attempt, which seems to do something weirder where each component is repeated twice.

// time unit tokenizer
package main

import "fmt"
import "regexp"

func main() {
    s := "1 days 40 hrs 23 min 50 sec"
    re := regexp.MustCompile("(?P<quant>\\d+) (?P<unit>\\w+)+")

    segs := re.FindAllString(s, -1)
    fmt.Println("segs:", segs)
    fmt.Println(segs[0], "," ,segs[1], ",", segs[2], ",", segs[3])  
    fmt.Println("length segs:", len(segs))

    segs2 := re.FindAllStringSubmatch(s, -1)
    fmt.Println("segs2:", segs2)
    fmt.Println(segs2[0], "," ,segs2[1], ",", segs2[2], ",", segs2[3])
    fmt.Println("length segs2:", len(segs2))
}

Output:

segs: [1 days 40 hrs 23 min 50 sec]
1 days , 40 hrs , 23 min , 50 sec
length segs: 4
segs2: [[1 days 1 days] [40 hrs 40 hrs] [23 min 23 min] [50 sec 50 sec]]
[1 days 1 days] , [40 hrs 40 hrs] , [23 min 23 min] , [50 sec 50 sec]
length segs2: 4

I've written a similar regex is Python which works OK, so I'm really not sure whether I am doing something incorrect for Go's regular expression syntax or perhaps making the wrong call on the re object.

  • 写回答

1条回答 默认 最新

  • drwf69817 2014-01-05 06:03
    关注

    Regexp.FindAllStringSubmatch returns [][]string. But its contents are slightly different from the return value of the Python function re.findall (I assumed that you used re.findall in Python).

    • return_value[i][0] contains whole matched string.
    • return_value[i][1] contains captured group 1.
    • return_value[i][2] contains captured group 2. ....

    Printing return_value[i] cause all items in return_value[i] to be printed. (return_value[i][0], return_value[i][1], return_value[i][2], ..)


    You can get what you expected by only printing captured group matches (excluding [0]) as follow:

    segs2 := re.FindAllStringSubmatch(s, -1)
    for i := 0; i < len(segs2); i++ {
        fmt.Println(segs2[i][1], "," ,segs2[i][2]);
    }
    

    Demo


    Side Note

    Following string literal:

    "(?P<quant>\\d+) (?P<unit>\\w+)+"
    

    can be expressed as the following raw string literals.

    `(?P<quant>\d+) (?P<unit>\w+)+`
    

    See String literals

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3