duanli0162 2019-08-19 22:58
浏览 121
已采纳

从字符串中提取信息

When given a string of the form https://website-name.some-domain.some-sub-domain.com/resourceId (type 1) or https://website-name.some-sub-domain.com/resourceId?randomContent (type 2), I need to extract out only two sub-strings. I need the website-name in one string and resourceId in an other string.

I have extracted the website name using the following code:

s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"
w := regexp.MustCompile("https://(.*?)\\.")
website := w.FindStringSubmatch(s)
fmt.Println(website[1])

I have the other regex to get the resourceId

s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"
r := regexp.MustCompile("com/(.*?)\\?")
resource := r.FindStringSubmatch(s)
fmt.Println(resource[1])

This works for any string that ends with ? or ?randomContent. But I have strings that don't have a trailing ? and I am not able to work with such cases (type 1).

I tried "(com/(.*?)\\?)|(com/(.*?).*)" to get resourceId which is of no use.

I am not able to find an elegant way to extract these two sub-strings.

Note: The randomContent is an arbitrarily long substring, the same goes for the resourceId as well. But the resourceId will not have ? in it. Upon encountering a ?, it can be said that the resourceId has ended.

Also, website-name can differ, but the pattern is the same - An arbitrary sub-domain and a .com will be present in the string.

Here is what I have tried: https://play.golang.org/p/MGQIT5XRuuh

  • 写回答

3条回答 默认 最新

  • douhu8851 2019-08-20 01:32
    关注

    The sample strings you show are ordinary HTTPS URLs, so you can use the net/url package to parse them. The website-name is the first part of the parsedUrl.Hostname(), and the resourceId is the parsedUrl.Path less a leading /.

    u, err := url.Parse(s)
    if err != nil {
        panic(err)
    }
    host := u.Hostname()
    first := strings.SplitN(host, ".", 2)[0]
    fmt.Printf("website-name: %s
    ", first)
    fmt.Printf("resourceId: %s
    ", u.Path[1:])
    

    https://play.golang.org/p/fnF2RTBuFxR has a complete example, including the two URL strings from the question. This works even if the hostname part of the URL doesn't end with .com, or the path part includes that string, or there is a port number or hash fragment, or other variations.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 FPGA-SRIO初始化失败
  • ¥15 MapReduce实现倒排索引失败
  • ¥15 luckysheet
  • ¥15 ZABBIX6.0L连接数据库报错,如何解决?(操作系统-centos)
  • ¥15 找一位技术过硬的游戏pj程序员
  • ¥15 matlab生成电测深三层曲线模型代码
  • ¥50 随机森林与房贷信用风险模型
  • ¥50 buildozer打包kivy app失败
  • ¥30 在vs2022里运行python代码
  • ¥15 不同尺寸货物如何寻找合适的包装箱型谱