duanli0162 2019-08-19 22:58
浏览 121
已采纳

从字符串中提取信息

When given a string of the form https://website-name.some-domain.some-sub-domain.com/resourceId (type 1) or https://website-name.some-sub-domain.com/resourceId?randomContent (type 2), I need to extract out only two sub-strings. I need the website-name in one string and resourceId in an other string.

I have extracted the website name using the following code:

s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"
w := regexp.MustCompile("https://(.*?)\\.")
website := w.FindStringSubmatch(s)
fmt.Println(website[1])

I have the other regex to get the resourceId

s := "https://website-name.some-domain.some-sub-domain.com/resourceId?randomContent"
r := regexp.MustCompile("com/(.*?)\\?")
resource := r.FindStringSubmatch(s)
fmt.Println(resource[1])

This works for any string that ends with ? or ?randomContent. But I have strings that don't have a trailing ? and I am not able to work with such cases (type 1).

I tried "(com/(.*?)\\?)|(com/(.*?).*)" to get resourceId which is of no use.

I am not able to find an elegant way to extract these two sub-strings.

Note: The randomContent is an arbitrarily long substring, the same goes for the resourceId as well. But the resourceId will not have ? in it. Upon encountering a ?, it can be said that the resourceId has ended.

Also, website-name can differ, but the pattern is the same - An arbitrary sub-domain and a .com will be present in the string.

Here is what I have tried: https://play.golang.org/p/MGQIT5XRuuh

  • 写回答

3条回答 默认 最新

  • douhu8851 2019-08-20 01:32
    关注

    The sample strings you show are ordinary HTTPS URLs, so you can use the net/url package to parse them. The website-name is the first part of the parsedUrl.Hostname(), and the resourceId is the parsedUrl.Path less a leading /.

    u, err := url.Parse(s)
    if err != nil {
        panic(err)
    }
    host := u.Hostname()
    first := strings.SplitN(host, ".", 2)[0]
    fmt.Printf("website-name: %s
    ", first)
    fmt.Printf("resourceId: %s
    ", u.Path[1:])
    

    https://play.golang.org/p/fnF2RTBuFxR has a complete example, including the two URL strings from the question. This works even if the hostname part of the URL doesn't end with .com, or the path part includes that string, or there is a port number or hash fragment, or other variations.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 想编写一个期货跨期套利的程序
  • ¥15 UltraScale 系列 Bitslip 技术支持
  • ¥15 一个线程在sleep的时候set一个信号会起作用吗
  • ¥100 需求高精度PT100设计电路和算法
  • ¥15 单片机配网,继电器开关,广播
  • ¥60 Qcustomplot绘制实时动态曲线
  • ¥20 运用matlab画x-y图
  • ¥15 用idea运行项目,运行tomcat报错:断言失败
  • ¥15 Sqlserver查询链接服务器数据问题
  • ¥15 Bibtex4Word 引用中文文献