drjv5597 2017-12-17 04:39
浏览 8
已采纳

Golang刮擦如何定义匹配

I try to use this golang package to scrape website images.

This is the html node that I need to scrape.

<ul class="list clearfix">
 <li> 
     <div>
          <a href=www.example.com/asda">
                     <img src="..sadsada./ssa/3.jpg">
         </a>
      </div>
   </li>
 <li> 
     <div>
          <a href=www.example.comsdsds/sds">
                     <img srr="..sadsada./ssa/2.jpg">
         </a>
      </div>
   </li>
 <li> 
     <div>
          <a href=www.example.com/sdds">
                     <img src="..sadsada./ssa/1.jpg">
         </a>
      </div>
   </li>
  .......
</ul>

How do I get the image src?

Here is the matches I tried:

matcher := func(n *html.Node) bool {

        if n.DataAtom == atom.A && n.Parent != nil && n.Parent.Parent != nil && n.Parent.Parent.Parent != nil && n.Parent.Parent.Parent.Parent != nil {

            return scrape.Attr(n.Parent.Parent.Parent.Parent, "class") == "list clearfix"
        }
        return false
    }

    images := scrape.FindAll(root, matcher)

But it doesn't work.

  • 写回答

1条回答 默认 最新

  • drryyiuib43562604 2017-12-17 07:15
    关注

    Fixed code:

    matcher := func(n *html.Node) bool {
        if n.Data == "img" && // Is img tag
            n.Parent != nil && // Parent exists
            n.Parent.DataAtom == atom.A && // Parent is <a>
            n.Parent.Parent != nil && // Parent's Parent exists (div)
            n.Parent.Parent.Parent != nil && // Parent's Parent's Parent exists (li)
            n.Parent.Parent.Parent.Parent != nil { // Parent's Parent's Parent's Parent exists (ul)
            return scrape.Attr(n.Parent.Parent.Parent.Parent, "class") == "list clearfix"
        }
        return false
    }
    
    images := scrape.FindAll(root, matcher)
    for i, img := range images {
        src := scrape.Attr(img, "src")
        fmt.Printf("Image %d src=%s
    ", i, src)
    }
    

    I just modified your matcher func to fix the issues you had.

    Also note, your HTML in your question is invalid. There were a few missing "'s along with a mispelt src attribute.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 通联支付网上收银统一下单接口
  • ¥15 angular有偿编写,
  • ¥15 VB.NET使用保存对话框保存图片?
  • ¥15 centos7系统下abinit安装时make出错
  • ¥15 hbuildex运行微信小程序报错
  • ¥15 关于#python#的问题:我知道这个问题对你们来说肯定so easy
  • ¥15 wpf datagrid如何实现多层表头
  • ¥15 为啥画版图在Run DRC会出现Connect Error?可我Calibre的hostname和计算机的hostname已经设置成一样的了。
  • ¥20 网站后台使用极速模式非常的卡
  • ¥20 Keil uVision5创建project没反应