doudou130216 2016-05-01 11:18 采纳率: 0%
浏览 337
已采纳

正则表达式以html(golang)查找图像

I'm parsing an xml rss feed from a couple of different sources and I want to find the images in the html.

I did some research and I found a regex that I think might work

/<img[^>]+src="?([^"\s]+)"?\s*\/>/g

but I have trouble using it in go. It gives me errors because I don't know how to make it search with that expression.

I tried using it as a string, it doesn't escape properly with single or with double quotes. I tried using it just like that, bare, and it gives me an error.

Any ideas?

  • 写回答

2条回答 默认 最新

  • douya6606 2016-05-01 12:31
    关注

    Using a proper html parser is always better for parsing html, however a cheap / hackish regex can also work fine, here's an example:

    var imgRE = regexp.MustCompile(`<img[^>]+\bsrc=["']([^"']+)["']`)
    // if your img's are properly formed with doublequotes then use this, it's more efficient.
    // var imgRE = regexp.MustCompile(`<img[^>]+\bsrc="([^"]+)"`)
    func findImages(htm string) []string {
        imgs := imgRE.FindAllStringSubmatch(htm, -1)
        out := make([]string, len(imgs))
        for i := range out {
            out[i] = imgs[i][1]
        }
        return out
    }
    

    <kbd>playground</kbd>

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 java 的protected权限 ,问题在注释里
  • ¥15 这个是哪里有问题啊?