douzhi19900102 2014-10-01 02:50
浏览 233
已采纳

HTML-查找给定标签中的所有子标签

Assume I have a html page that contains something like

<ul class ="good">
    <li>1</li>
    <li>2</li>
    <li>3</li>
</ul>

<ul class ="bad">
    <li>a</li>
    <li>b</li>
    <li>c</li>
</ul>

I want to grab the <li> elements inside the first <ul>. From here I have basically copied (note: edited code per @twotwotwo comment)

page, _ := html.Parse(httpBody)
    var f func(*html.Node)
    f = func(n *html.Node) {
        //fmt.Println("Inside f")
        if n.Type == html.ElementNode && n.Data == "ul" {
            fmt.Println("ul found ->  ",n)
            for c := n.FirstChild; c != nil; c = c.NextSibling {
                f(c)
            }
        } else {
          fmt.Println(n.Data ,"is not the correct one")
          for c := n.FirstChild; c != nil; c = c.NextSibling { f(c) }
          }
    }
f(page)

But the only output I obtain is

 is not the correct one
html is not the correct one
head is not the correct one
body is not the correct one

I wonder why the recursion stops at body. I have tried with motherfuckingwebsite.com which has tags inside the body

P.S. I have also tried

page := html.NewTokenizer(httpBody)

for {
    tokenType := page.Next()
    if tokenType == html.ErrorToken {
        return links
    }
    token := page.Token()

but this seem to show all the tokens, without caring about the tree structure.

EDIT:

  • 写回答

1条回答 默认 最新

  • doukuang1897 2014-10-01 04:33
    关注

    I have, in the past, used this package: https://github.com/PuerkitoBio/goquery

    It provides a "jQuery-like" interface/querying across HTML documents. With that library, its as simple as this:

    import (
        "bytes"
        "fmt"
        "log"
    
        "github.com/PuerkitoBio/goquery"
    )
    
    var httpBody string = `
        <ul class ="good">
            <li>1</li>
            <li>2</li>
            <li>3</li>
        </ul>
    
        <ul class ="bad">
            <li>a</li>
            <li>b</li>
            <li>c</li>
        </ul>
    `
    
    func main() {
        b := bytes.NewBufferString(httpBody)
        doc, err := goquery.NewDocumentFromReader(b)
        if err != nil {
            log.Fatal(err)
        }
    
        doc.Find("ul.good").Each(func(i int, ul *goquery.Selection) {
            ul.Find("li").Each(func(i int, li *goquery.Selection) {
                fmt.Println(li.Text())
            })
        })
    }
    

    Which prints:

    1
    2
    3
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 Matlab打开默认名称带有/的光谱数据
  • ¥50 easyExcel模板 动态单元格合并列
  • ¥15 res.rows如何取值使用
  • ¥15 在odoo17开发环境中,怎么实现库存管理系统,或独立模块设计与AGV小车对接?开发方面应如何设计和开发?请详细解释MES或WMS在与AGV小车对接时需完成的设计和开发
  • ¥15 CSP算法实现EEG特征提取,哪一步错了?
  • ¥15 游戏盾如何溯源服务器真实ip?需要30个字。后面的字是凑数的
  • ¥15 vue3前端取消收藏的不会引用collectId
  • ¥15 delphi7 HMAC_SHA256方式加密
  • ¥15 关于#qt#的问题:我想实现qcustomplot完成坐标轴
  • ¥15 下列c语言代码为何输出了多余的空格