When given a string of the form https://website-name.some-domain.some-sub-domain.com/resourceId (type 1) or https://website-name.some-sub-domain.com/resourceId?randomContent (type 2), I need to extract out only two sub-strings. I need the website-name in one string and resourceId in an other string.

I have extracted the website name using the following code:

s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"
w := regexp.MustCompile("https://(.*?)\\.")
website := w.FindStringSubmatch(s)

I have the other regex to get the resourceId

s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"
r := regexp.MustCompile("com/(.*?)\\?")
resource := r.FindStringSubmatch(s)

This works for any string that ends with ? or ?randomContent. But I have strings that don't have a trailing ? and I am not able to work with such cases (type 1).

I tried "(com/(.*?)\\?)|(com/(.*?).*)" to get resourceId which is of no use.

I am not able to find an elegant way to extract these two sub-strings.

Note: The randomContent is an arbitrarily long substring, the same goes for the resourceId as well. But the resourceId will not have ? in it. Upon encountering a ?, it can be said that the resourceId has ended.

Also, website-name can differ, but the pattern is the same - An arbitrary sub-domain and a .com will be present in the string.

Here is what I have tried: https://play.golang.org/p/MGQIT5XRuuh

  • douhu8851 2019-08-20 01:32

    The sample strings you show are ordinary HTTPS URLs, so you can use the net/url package to parse them. The website-name is the first part of the parsedUrl.Hostname(), and the resourceId is the parsedUrl.Path less a leading /.

    u, err := url.Parse(s)
    if err != nil {
    host := u.Hostname()
    first := strings.SplitN(host, ".", 2)[0]
    fmt.Printf("website-name: %s
    ", first)
    fmt.Printf("resourceId: %s
    ", u.Path[1:])

    https://play.golang.org/p/fnF2RTBuFxR has a complete example, including the two URL strings from the question. This works even if the hostname part of the URL doesn't end with .com, or the path part includes that string, or there is a port number or hash fragment, or other variations.

