dsfew215211 2017-01-10 21:29
浏览 37
已采纳

goquery:到达另一个元素时停止解析

Suppose I have this HTML page. I want to parse it using Go and goquery:

<html>
    <head><!--Page header stuff--></head>
    <body>
         <h1 class="h1-class">Heading 1</h1>
             <div class="div-class">Stuff1</div>
             <div class="div-class">Stuff2</div>
         <h1 class="h1-class">Heading 2</h1>
             <div class="div-class">Stuff3</div>
             <div class="div-class">Stuff4</div>
    </body>
</html>

As it happens, I'd like only to get those DIVs before Heading 2 and skip the rest. This code works great to get all DIVs:

 doc := GetGoQueryDocument(url) //Defined elsewhere
 doc.Find("div.div-class").Each(func(_ int, theDiv *goquery.Selection){
     //do stuff with each theDiv
     //The problem is that it finds div.div-class elements below Heading 2.
     //I want to skip those.
 })

Is there any way to tell goquery to skip elements located beneath a certain tag and classname? Thanks for any tips!

  • 写回答

1条回答 默认 最新

  • duan36000 2017-01-10 21:35
    关注

    Yes, actually pretty easy:

    doc.Find(".h1-class").First().NextUntil(".h1-class")
    

    I would recommend you read through the godoc: https://godoc.org/github.com/PuerkitoBio/goquery

    It explains all of the different ways you can manipulate the selection.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line
  • ¥500 火焰左右视图、视差(基于双目相机)
  • ¥100 set_link_state
  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?