douan7601 2018-01-03 16:08
浏览 195
已采纳

在Go中解析格式化的字符串

The Problem

I have slice of string values wherein each value is formatted based on a template. In my particular case, I am trying to parse Markdown URLs as shown below:

- [What did I just commit?](#what-did-i-just-commit)
- [I wrote the wrong thing in a commit message](#i-wrote-the-wrong-thing-in-a-commit-message)
- [I committed with the wrong name and email configured](#i-committed-with-the-wrong-name-and-email-configured)
- [I want to remove a file from the previous commit](#i-want-to-remove-a-file-from-the-previous-commit)
- [I want to delete or remove my last commit](#i-want-to-delete-or-remove-my-last-commit)
- [Delete/remove arbitrary commit](#deleteremove-arbitrary-commit)
- [I tried to push my amended commit to a remote, but I got an error message](#i-tried-to-push-my-amended-commit-to-a-remote-but-i-got-an-error-message)
- [I accidentally did a hard reset, and I want my changes back](#i-accidentally-did-a-hard-reset-and-i-want-my-changes-back)

What I want to do?

I am looking for ways to parse this into a value of type:

type Entity struct {
    Statement string
    URL string
}

What have I tried?

As you can see, all the items follow the pattern: - [{{ .Statement }}]({{ .URL }}). I tried using the fmt.Sscanf function to scan each string as:

var statement, url string
fmt.Sscanf(s, "[%s](%s)", &statement, &url)

This results in:

statement = "I"
url = ""

The issue is with the scanner storing space-separated values only. I do not understand why the URL field is not getting populated based on this rule.

How can I get the Markdown values as mentioned above?

EDIT: As suggested by Marc, I will add couple of clarification points:

  1. This is a general purpose question on parsing strings based on a format. In my particular case, a Markdown parser might help me but my intention to learn how to handle such cases in general where a library might not exist.
  2. I have read the official documentation before posting here.
  • 写回答

2条回答 默认 最新

  • doujunchi1238 2018-01-03 16:25
    关注

    Note: The following solution only works for "simple", non-escaped input markdown links. If this suits your needs, go ahead and use it. For full markdown-compatibility you should use a proper markdown parser such as gopkg.in/russross/blackfriday.v2.


    You could use regexp to get the link text and the URL out of a markdown link.

    So the general input text is in the form of:

    [some text](somelink)
    

    A regular expression that models this:

    \[([^\]]+)\]\(([^)]+)\)
    

    Where:

    • \[ is the literal [
    • ([^\]]+) is for the "some text", it's everything except the closing square brackets
    • \] is the literal ]
    • \( is the literal (
    • ([^)]+) is for the "somelink", it's everything except the closing brackets
    • \) is the literal )

    Example:

    r := regexp.MustCompile(`\[([^\]]+)\]\(([^)]+)\)`)
    
    inputs := []string{
        "[Some text](#some/link)",
        "[What did I just commit?](#what-did-i-just-commit)",
        "invalid",
    }
    
    for _, input := range inputs {
        fmt.Println("Parsing:", input)
        allSubmatches := r.FindAllStringSubmatch(input, -1)
    
        if len(allSubmatches) == 0 {
            fmt.Println("   No match!")
        } else {
            parts := allSubmatches[0]
            fmt.Println("   Text:", parts[1])
            fmt.Println("   URL: ", parts[2])
        }
    }
    

    Output (try it on the Go Playground):

    Parsing: [Some text](#some/link)
       Text: Some text
       URL:  #some/link
    Parsing: [What did I just commit?](#what-did-i-just-commit)
       Text: What did I just commit?
       URL:  #what-did-i-just-commit
    Parsing: invalid
       No match!
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥50 uniapp小程序页面录制可回溯
  • ¥15 求Houdini使用行家,付费。价格面议。
  • ¥15 AttributeError: 'EasyDict' object has no attribute 'BACKUP_DB_INFO'
  • ¥15 前端高拍仪调用问题报错
  • ¥15 想用octave解决这个数学问题
  • ¥15 Centos新建的临时ip无法上网,如何解决?
  • ¥15 海康威视如何实现客户端软件对设备语音请求的处理。
  • ¥15 支付宝h5参数如何实现跳转
  • ¥15 MATLAB代码补全插值
  • ¥15 Typegoose 中如何使用 arrayFilters 筛选并更新深度嵌套的子文档数组信息