doubi4491 2017-01-05 08:49 采纳率: 0%
浏览 127
已采纳

如何使用golang抓取h1标签的标题?

Suppose this is a h1 tag

<h1>FindMe</h1>

in a huge webpage with many other h1 tags, but this is the first h1 tag. So I am using the net/html package and I am searching for the first StartTagToken, after my program has found the token, how do I get what is written inside the heading i.e. FindMe in this case.

This is the code I have right now

z := html.NewTokenizer(body)    

for{
    tt := z.Next()

    if tt= html.ErrorToken{
        return
    }
    else if tt== html.StartTagToken{
        tag := z.Token()

        if tag.Data =="h1"{
            fmt.Println("We found the title
")
            //some code to find what is stored in the heading
        }
    }
} 

How do I go about doing that?

EDIT: More specifically, what is the property of variable tag which would give me the text inside of it. I may be wrong with the conceptual terms here. Please bear with me

  • 写回答

1条回答 默认 最新

  • dpu66046 2017-01-05 09:16
    关注

    What you got is the StartTagToken, the part you're intrested in is between it and the corresponding EndTagToken as TextToken. So you need to read the next token and it's Data should be the value you're after, something like

    ...
    if tag.Data =="h1"{
       if tt = z.Next(); tt == html.TextToken {
           fmt.Println(z.Token().Data)
       }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
编辑
预览

报告相同问题?

手机看
程序员都在用的中文IT技术交流社区

程序员都在用的中文IT技术交流社区

专业的中文 IT 技术社区,与千万技术人共成长

专业的中文 IT 技术社区,与千万技术人共成长

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

关注【CSDN】视频号,行业资讯、技术分享精彩不断,直播好礼送不停!

客服 返回
顶部