douzhi19900102 2014-10-01 02:50
浏览 231
已采纳

HTML-查找给定标签中的所有子标签

Assume I have a html page that contains something like

<ul class ="good">
    <li>1</li>
    <li>2</li>
    <li>3</li>
</ul>

<ul class ="bad">
    <li>a</li>
    <li>b</li>
    <li>c</li>
</ul>

I want to grab the <li> elements inside the first <ul>. From here I have basically copied (note: edited code per @twotwotwo comment)

page, _ := html.Parse(httpBody)
    var f func(*html.Node)
    f = func(n *html.Node) {
        //fmt.Println("Inside f")
        if n.Type == html.ElementNode && n.Data == "ul" {
            fmt.Println("ul found ->  ",n)
            for c := n.FirstChild; c != nil; c = c.NextSibling {
                f(c)
            }
        } else {
          fmt.Println(n.Data ,"is not the correct one")
          for c := n.FirstChild; c != nil; c = c.NextSibling { f(c) }
          }
    }
f(page)

But the only output I obtain is

 is not the correct one
html is not the correct one
head is not the correct one
body is not the correct one

I wonder why the recursion stops at body. I have tried with motherfuckingwebsite.com which has tags inside the body

P.S. I have also tried

page := html.NewTokenizer(httpBody)

for {
    tokenType := page.Next()
    if tokenType == html.ErrorToken {
        return links
    }
    token := page.Token()

but this seem to show all the tokens, without caring about the tree structure.

EDIT:

  • 写回答

1条回答 默认 最新

  • doukuang1897 2014-10-01 04:33
    关注

    I have, in the past, used this package: https://github.com/PuerkitoBio/goquery

    It provides a "jQuery-like" interface/querying across HTML documents. With that library, its as simple as this:

    import (
        "bytes"
        "fmt"
        "log"
    
        "github.com/PuerkitoBio/goquery"
    )
    
    var httpBody string = `
        <ul class ="good">
            <li>1</li>
            <li>2</li>
            <li>3</li>
        </ul>
    
        <ul class ="bad">
            <li>a</li>
            <li>b</li>
            <li>c</li>
        </ul>
    `
    
    func main() {
        b := bytes.NewBufferString(httpBody)
        doc, err := goquery.NewDocumentFromReader(b)
        if err != nil {
            log.Fatal(err)
        }
    
        doc.Find("ul.good").Each(func(i int, ul *goquery.Selection) {
            ul.Find("li").Each(func(i int, li *goquery.Selection) {
                fmt.Println(li.Text())
            })
        })
    }
    

    Which prints:

    1
    2
    3
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 学校优化算法sbo和蚁群算法怎么结合
  • ¥21 matlab怎么求时域信号的二阶导数
  • ¥15 判断两个表是否完全相同
  • ¥15 java map类型数据格式,如何快速通过前缀匹配元素
  • ¥15 stc12c5a60s2、QMC5883L、LCD1602组合测量磁场所需程序
  • ¥20 Win11测试yolov4,“找不到nvcuda.dll”怎么办?
  • ¥15 simulink绘制bode图
  • ¥15 php_network_getaddresses: getaddrinfo failed: Name or service not known
  • ¥15 用msg发消息出现的问题
  • ¥15 unity3d机械臂