My task is to find images urls inside an html
The problem
Html parser golang.org/x/net/html
as well as
github.com/PuerkitoBio/goquery
igonores the biggest image on the page http://www.ozon.ru/context/detail/id/34498204/
The question
- What is wrong in my code
- Why required
img
tag withsrc=""
is ignored? - Is there are way to get all images from html with go?
Notes:
When i used parser written in Swift this image has been found on the page
//static2.ozone.ru/multimedia/spare_covers/1013531536.jpg
This image tag has been found when i use regex search.
This image tag has been found when i use third party service saveallimages.com
I tried to use gokogiri but has no success to compile it on my mac.
Go get
is successful, butGo build
stuck forever.
Parsed html page source
This is the html which is result of resp, _ := http.Get(url)
Code:
package main
import (
"golang.org/x/net/html"
"log"
"net/http"
)
func main() {
url := "http://www.ozon.ru/context/detail/id/34498204/"
if resp, err := http.Get(url); err == nil {
defer resp.Body.Close()
log.Println("Load page complete")
if resp != nil {
log.Println("Page response is NOT nil")
if document, err := html.Parse(resp.Body); err == nil {
var parser func(*html.Node)
parser = func(n *html.Node) {
if n.Type == html.ElementNode && n.Data == "img" {
var imgSrcUrl, imgDataOriginal string
for _, element := range n.Attr {
if element.Key == "src" {
imgSrcUrl = element.Val
}
if element.Key == "data-original" {
imgDataOriginal = element.Val
}
}
log.Println(imgSrcUrl, imgDataOriginal)
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
parser(c)
}
}
parser(document)
} else {
log.Panicln("Parse html error", err)
}
} else {
log.Println("Page response IS nil")
}
}
}