douzhi19900102 2014-10-01 02:50
浏览 231
已采纳

HTML-查找给定标签中的所有子标签

Assume I have a html page that contains something like

<ul class ="good">
    <li>1</li>
    <li>2</li>
    <li>3</li>
</ul>

<ul class ="bad">
    <li>a</li>
    <li>b</li>
    <li>c</li>
</ul>

I want to grab the <li> elements inside the first <ul>. From here I have basically copied (note: edited code per @twotwotwo comment)

page, _ := html.Parse(httpBody)
    var f func(*html.Node)
    f = func(n *html.Node) {
        //fmt.Println("Inside f")
        if n.Type == html.ElementNode && n.Data == "ul" {
            fmt.Println("ul found ->  ",n)
            for c := n.FirstChild; c != nil; c = c.NextSibling {
                f(c)
            }
        } else {
          fmt.Println(n.Data ,"is not the correct one")
          for c := n.FirstChild; c != nil; c = c.NextSibling { f(c) }
          }
    }
f(page)

But the only output I obtain is

 is not the correct one
html is not the correct one
head is not the correct one
body is not the correct one

I wonder why the recursion stops at body. I have tried with motherfuckingwebsite.com which has tags inside the body

P.S. I have also tried

page := html.NewTokenizer(httpBody)

for {
    tokenType := page.Next()
    if tokenType == html.ErrorToken {
        return links
    }
    token := page.Token()

but this seem to show all the tokens, without caring about the tree structure.

EDIT:

  • 写回答

1条回答 默认 最新

  • doukuang1897 2014-10-01 04:33
    关注

    I have, in the past, used this package: https://github.com/PuerkitoBio/goquery

    It provides a "jQuery-like" interface/querying across HTML documents. With that library, its as simple as this:

    import (
        "bytes"
        "fmt"
        "log"
    
        "github.com/PuerkitoBio/goquery"
    )
    
    var httpBody string = `
        <ul class ="good">
            <li>1</li>
            <li>2</li>
            <li>3</li>
        </ul>
    
        <ul class ="bad">
            <li>a</li>
            <li>b</li>
            <li>c</li>
        </ul>
    `
    
    func main() {
        b := bytes.NewBufferString(httpBody)
        doc, err := goquery.NewDocumentFromReader(b)
        if err != nil {
            log.Fatal(err)
        }
    
        doc.Find("ul.good").Each(func(i int, ul *goquery.Selection) {
            ul.Find("li").Each(func(i int, li *goquery.Selection) {
                fmt.Println(li.Text())
            })
        })
    }
    

    Which prints:

    1
    2
    3
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 帮我写一段可以读取LD2450数据并计算距离的Arduino代码
  • ¥15 C#调用python代码(python带有库)
  • ¥15 矩阵加法的规则是两个矩阵中对应位置的数的绝对值进行加和
  • ¥15 活动选择题。最多可以参加几个项目?
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)
  • ¥20 怎么在stm32门禁成品上增加查询记录功能
  • ¥15 Source insight编写代码后使用CCS5.2版本import之后,代码跳到注释行里面