duanli0162 2019-08-19 22:58
浏览 121
已采纳

从字符串中提取信息

When given a string of the form https://website-name.some-domain.some-sub-domain.com/resourceId (type 1) or https://website-name.some-sub-domain.com/resourceId?randomContent (type 2), I need to extract out only two sub-strings. I need the website-name in one string and resourceId in an other string.

I have extracted the website name using the following code:

s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"
w := regexp.MustCompile("https://(.*?)\\.")
website := w.FindStringSubmatch(s)
fmt.Println(website[1])

I have the other regex to get the resourceId

s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"
r := regexp.MustCompile("com/(.*?)\\?")
resource := r.FindStringSubmatch(s)
fmt.Println(resource[1])

This works for any string that ends with ? or ?randomContent. But I have strings that don't have a trailing ? and I am not able to work with such cases (type 1).

I tried "(com/(.*?)\\?)|(com/(.*?).*)" to get resourceId which is of no use.

I am not able to find an elegant way to extract these two sub-strings.

Note: The randomContent is an arbitrarily long substring, the same goes for the resourceId as well. But the resourceId will not have ? in it. Upon encountering a ?, it can be said that the resourceId has ended.

Also, website-name can differ, but the pattern is the same - An arbitrary sub-domain and a .com will be present in the string.

Here is what I have tried: https://play.golang.org/p/MGQIT5XRuuh

  • 写回答

3条回答 默认 最新

  • douhu8851 2019-08-20 01:32
    关注

    The sample strings you show are ordinary HTTPS URLs, so you can use the net/url package to parse them. The website-name is the first part of the parsedUrl.Hostname(), and the resourceId is the parsedUrl.Path less a leading /.

    u, err := url.Parse(s)
    if err != nil {
        panic(err)
    }
    host := u.Hostname()
    first := strings.SplitN(host, ".", 2)[0]
    fmt.Printf("website-name: %s
    ", first)
    fmt.Printf("resourceId: %s
    ", u.Path[1:])
    

    https://play.golang.org/p/fnF2RTBuFxR has a complete example, including the two URL strings from the question. This works even if the hostname part of the URL doesn't end with .com, or the path part includes that string, or there is a port number or hash fragment, or other variations.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab
  • ¥20 重新写的代码替换了之后运行hbuliderx就这样了
  • ¥100 监控抖音用户作品更新可以微信公众号提醒
  • ¥15 UE5 如何可以不渲染HDRIBackdrop背景
  • ¥70 2048小游戏毕设项目
  • ¥20 mysql架构,按照姓名分表
  • ¥15 MATLAB实现区间[a,b]上的Gauss-Legendre积分
  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题