正则表达式用于golang中所有URL的markdown URL

I am trying to work through a markdown file, replacing all the image urls. The format of a markdown image url is ![alternative name](imageurl.png)

My regex search finds the first one, returns the location and I replace it, I then cycle through the document until my regex search doesnt find any - i.e its array of match dimensions is empty.

The problem is for some reason it continues to match on "i dont exactly know what". I.e the length of the array returned from the regex search is never 0

location := split[:locationSplit]

bodyRe := regexp.MustCompile(`!\[(.*)\]\((.*)\)`)
indexes := bodyRe.FindStringIndex(body)
fmt.Println("location: ", absoluteFileLocation)
fmt.Println("length: ", indexes)

for len(indexes) != 0 {
    fmt.Println("length: ", len(indexes))
    imageLocation := body[indexes[0]:indexes[1]]
    body = body[:indexes[0]] + imageLocation + body[indexes[1]:]
    indexes = indexes[:0]
    fmt.Println("length: ", len(indexes))
    indexes = bodyRe.FindStringIndex(body)
}

this returns a continual:

length:  2
length:  0
length:  2
length:  0
length:  2
length:  0
length:  2
length:  0
length:  2

the 2's come from the line indexes = bodyRe.FindStringIndex(body) inside the loop as I set indexes to 0 just before.

Help appreciated

EDIT: Edit due to request for example included. The above method is clearly flawed, this following method works for the first image, but not for the next ones

So I attempted this technique:

(sample markdown file)

some markdown

![image](anImage.png)

more markdown

![image2](anImage2.png)

more markdown & end of document

and the the revised code:

...
...
    bodyRe := regexp.MustCompile(`!\[(.*)\]\((.*)\)`)
    indexes := bodyRe.FindAllStringSubmatchIndex(body, -1)

    for _, j := range(indexes) { //i is the index, j is the element (in this case j = []int )
        imageLocation := body[j[4]:j[5]]
        body = body[:j[4]] + "/App/Image/?image=" + location + "/" + imageLocation + body[j[5]:]
    }
    return body

(required output markdown)

some markdown

![image](/App/Image/?image=[location]/anImage.png)

more markdown

![image2](/App/Image/?image=[location]/anImage2.png)

more markdown
end of document

And that works for the first image. But not the second one. The problem is (I think that when that method loops through and replaces the first one, the indexes in body (i.e body[j[4]:j[5]]) change and so it replaces the second one in the wrong place.

I need to do this so that when the markdown is eventually rendered the image urls point to places where they can be served from.

EDIT: Fixed

Thanks guys. Due to the fact that people struggled to understand what I wanted to do, I suspect that I am going about the problem in a strange way. I have got it working and below is the code snippet that works for any one else looking into this.

Firstly I will explain why I had the problem. I wanted to seperate out the writing of blogs for a site, from the actual maintenance of the site itself. Therefore 'blog writers' were told to write blogs in markdown, with all image tags in the format of ` where all images must be in the same directory as the markdown file itself. Because this directory is not part of the code base of the website itself, the image urls needed replacing with the absolute urls so they could be served. I didnt want this to be something the blog writers needed to worry about.

Everything worked fine for the first image, but because the replacement absolute URL changed the length and therefore the positions of all of the characters in the blog contents, the indexes that the regex found, no longer aligned, so I had to add the new length to the indexes of matches.

adjustment := 0
for _, j := range(indexes) {
    imageLocation := body[j[4]+adjustment:j[5]+adjustment]

    replacement := "?imageurl=" + url.QueryEscape(location) + "/" + imageLocation
    body = body[:j[4] + adjustment] + replacement + body[j[5] + adjustment:]
    adjustment += len(replacement) - len(imageLocation)
}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douqin3245 2015-07-10 09:02
关注
After this line:

imageLocation := body[indexes[0]:indexes[1]]

imageLocation will contain a string like ![image](anImage.png).

body = body[:indexes[0]] + imageLocation + body[indexes[1]:]

After that line, body will be the same as it was before. You're basically reconstructing it out of 3 segments.

This is equivalent to doing the following:

package main import "fmt" func main() { s := "Hello, playground" t := s[2:4] s = s[:2] + t + s[4:] fmt.Println(s) // prints "Hello, playground" }

In the next iteration, the same left-most match will be found again, ad perpetuum.

Have you read the documentation for FindStringIndex?

If you edit your question to say what you're trying to do I can provide you with a working code snippet.
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

正则表达式用于GoLang中的后缀检测
2017-10-06 21:41

回答 2 已采纳 This should do it, match any amount of characters (not ':') and end on ':' ^[^:]+:+$
正则表达式匹配golang中不以www开头的字符串
2018-10-04 13:48

回答 2 已采纳 If you're really bent on creating a negative lookahead manually, you will need to exclude all poss
如何在Golang中使用正则表达式获取url模式？ http
2015-05-27 06:03

回答 3 已采纳 http.HandleFunc() can not be used to register a pattern to match a regular expression. In short, t
一些正则表达式小技巧、去掉代码前面行号的方法、去掉单行注释、vscode统计行数...
2018-12-05 02:16

weixin_33834075的博客分享一些平时工作中常用的正则表达式小技巧一、去掉代码前面行号的方法 idongchenmodify：2018-12-5 csdn的markdown解析器蛮恶心的文章整体复制下来总有行号在前面。。。可以用正则找到这些行号给去掉就好：带点的...
正则表达式与Golang中的replace
2016-05-17 11:03

回答 1 已采纳 You can use capturing groups with alternations matching either string boundaries or a character no
为什么这些按位正则表达式在golang中匹配不同？
2019-01-29 14:03

回答 1 已采纳 The regexp documentation states that: All characters are UTF-8-encoded code points. So I thi
通过正则表达式从url路径中删除特定路径
2018-02-03 21:57

回答 1 已采纳 You may use the following regex: (?:/(?:area1|area2))+(/|$) See the regex demo. Details (?:/
github-markdown-toc.go：轻松为GitHub README.md创建TOC（进行中）
2021-02-04 06:54

此实现的优点：没有依赖关系（不需要curl，wget，awk等）跨平台（支持Windows，Mac OS等）用于解析TOC的正则表达式并行处理多个文件注意：仅当您的机器连接到Internet时，gh-md-toc才能正常工作。目录由创建安装...
如何用正则表达式和Golang替换可选组
2017-04-02 12:29

回答 1 已采纳 Go regexp do not support conditional statements and the Replace family of regexp functions doesn't
正则表达式“ golang文本匹配之前” javascript
2017-07-12 13:35

回答 1 已采纳 You may "reverse" the regex to match the strings you need. You want to match any 1+ chars other th
用于验证URL中“ /”的正则表达式
2015-08-15 06:29

回答 2 已采纳 Why not adding that case as an alternative: "^/$|(/(home|about)/(|[a-zA-Z0-9]+)$)"
golang-examples：Go（lang）示例-（解释#golang的基础知识）
2021-02-03 14:15

去例子如果您喜欢这个项目，您可能还喜欢我的存储库：关于这些示例说明...转换器，堆栈，cli等）监视文件夹中的新文件或修改过的文件，然后执行某些操作是在将markdown转换为html（通过正则表达式）是Go中的Wiki软件
如何使用Golang正则表达式查找完全匹配的单词？
2018-12-20 15:44

回答 1 已采纳 Use the zero-length word boundry sequence \b: https://play.golang.org/p/-f0KEKb2EbF regexp.MatchS
Go --- 将Markdown格式转化为普通的文档格式
2022-03-07 13:52

吕元龙的小屋的博客处理数据（根据自己的想法）\n\n## 正则表达式\n\n- 文档：https://studygolang.com/pkgdoc\n- API\n - re := regexp.MustCompile(reStr)，传入正则表达式，得到正则表达式对象\n - ret := re....
go-rimu：使用Go语言编写的Rimu Markup语言的端口
2021-02-14 05:52

因为Go regexp软件包使用RE2正则表达式，所以对替换定义和包含/排除宏调用中使用的正则表达式有。安装注意：需要Go 1.11或更高版本。下载，构建，测试和安装： git clone ...
没有解决我的问题, 去提问

悬赏问题

¥20 机器学习能否像多层线性模型一样处理嵌套数据
¥20 西门子S7-Graph,S7-300，梯形图
¥50 用易语言http 访问不了网页
¥50 safari浏览器fetch提交数据后数据丢失问题
¥15 matlab不知道怎么改，求解答！！
¥15 永磁直线电机的电流环pi调不出来
¥15 用stata实现聚类的代码
¥15 请问paddlehub能支持移动端开发吗？在Android studio上该如何部署？
¥20 docker里部署springboot项目，访问不到扬声器
¥15 netty整合springboot之后自动重连失效

正则表达式用于golang中所有URL的markdown URL

1条回答 默认 最新

悬赏问题

1条回答默认最新