dtrvzd1171 2015-07-09 23:17
浏览 173

正则表达式用于golang中所有URL的markdown URL

I am trying to work through a markdown file, replacing all the image urls. The format of a markdown image url is ![alternative name](imageurl.png)

My regex search finds the first one, returns the location and I replace it, I then cycle through the document until my regex search doesnt find any - i.e its array of match dimensions is empty.

The problem is for some reason it continues to match on "i dont exactly know what". I.e the length of the array returned from the regex search is never 0

location := split[:locationSplit]

bodyRe := regexp.MustCompile(`!\[(.*)\]\((.*)\)`)
indexes := bodyRe.FindStringIndex(body)
fmt.Println("location: ", absoluteFileLocation)
fmt.Println("length: ", indexes)

for len(indexes) != 0 {
    fmt.Println("length: ", len(indexes))
    imageLocation := body[indexes[0]:indexes[1]]
    body = body[:indexes[0]] + imageLocation + body[indexes[1]:]
    indexes = indexes[:0]
    fmt.Println("length: ", len(indexes))
    indexes = bodyRe.FindStringIndex(body)
}

this returns a continual:

length:  2
length:  0
length:  2
length:  0
length:  2
length:  0
length:  2
length:  0
length:  2

the 2's come from the line indexes = bodyRe.FindStringIndex(body) inside the loop as I set indexes to 0 just before.

Help appreciated

EDIT: Edit due to request for example included. The above method is clearly flawed, this following method works for the first image, but not for the next ones

So I attempted this technique:

(sample markdown file)

some markdown

![image](anImage.png)

more markdown

![image2](anImage2.png)

more markdown & end of document

and the the revised code:

...
...
    bodyRe := regexp.MustCompile(`!\[(.*)\]\((.*)\)`)
    indexes := bodyRe.FindAllStringSubmatchIndex(body, -1)

    for _, j := range(indexes) { //i is the index, j is the element (in this case j = []int )
        imageLocation := body[j[4]:j[5]]
        body = body[:j[4]] + "/App/Image/?image=" + location + "/" + imageLocation + body[j[5]:]
    }
    return body

(required output markdown)

some markdown

![image](/App/Image/?image=[location]/anImage.png)

more markdown

![image2](/App/Image/?image=[location]/anImage2.png)

more markdown
end of document

And that works for the first image. But not the second one. The problem is (I think that when that method loops through and replaces the first one, the indexes in body (i.e body[j[4]:j[5]]) change and so it replaces the second one in the wrong place.

I need to do this so that when the markdown is eventually rendered the image urls point to places where they can be served from.

EDIT: Fixed

Thanks guys. Due to the fact that people struggled to understand what I wanted to do, I suspect that I am going about the problem in a strange way. I have got it working and below is the code snippet that works for any one else looking into this.

Firstly I will explain why I had the problem. I wanted to seperate out the writing of blogs for a site, from the actual maintenance of the site itself. Therefore 'blog writers' were told to write blogs in markdown, with all image tags in the format of ` where all images must be in the same directory as the markdown file itself. Because this directory is not part of the code base of the website itself, the image urls needed replacing with the absolute urls so they could be served. I didnt want this to be something the blog writers needed to worry about.

Everything worked fine for the first image, but because the replacement absolute URL changed the length and therefore the positions of all of the characters in the blog contents, the indexes that the regex found, no longer aligned, so I had to add the new length to the indexes of matches.

adjustment := 0
for _, j := range(indexes) {
    imageLocation := body[j[4]+adjustment:j[5]+adjustment]

    replacement := "?imageurl=" + url.QueryEscape(location) + "/" + imageLocation
    body = body[:j[4] + adjustment] + replacement + body[j[5] + adjustment:]
    adjustment += len(replacement) - len(imageLocation)
}
  • 写回答

1条回答 默认 最新

  • douqin3245 2015-07-10 09:02
    关注

    After this line:

    imageLocation := body[indexes[0]:indexes[1]]
    

    imageLocation will contain a string like ![image](anImage.png).

    body = body[:indexes[0]] + imageLocation + body[indexes[1]:]
    

    After that line, body will be the same as it was before. You're basically reconstructing it out of 3 segments.

    This is equivalent to doing the following:

    package main
    
    import "fmt"
    
    func main() {
        s := "Hello, playground"
        t := s[2:4]
        s = s[:2] + t + s[4:]
        fmt.Println(s) // prints "Hello, playground"
    }
    

    In the next iteration, the same left-most match will be found again, ad perpetuum.

    Have you read the documentation for FindStringIndex?

    If you edit your question to say what you're trying to do I can provide you with a working code snippet.

    评论

报告相同问题?

悬赏问题

  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!
  • ¥15 永磁直线电机的电流环pi调不出来
  • ¥15 用stata实现聚类的代码
  • ¥15 请问paddlehub能支持移动端开发吗?在Android studio上该如何部署?
  • ¥20 docker里部署springboot项目,访问不到扬声器
  • ¥15 netty整合springboot之后自动重连失效