doubi4491 2017-01-05 16:49 采纳率: 0%
浏览 127
已采纳

如何使用golang抓取h1标签的标题?

Suppose this is a h1 tag

<h1>FindMe</h1>

in a huge webpage with many other h1 tags, but this is the first h1 tag. So I am using the net/html package and I am searching for the first StartTagToken, after my program has found the token, how do I get what is written inside the heading i.e. FindMe in this case.

This is the code I have right now

z := html.NewTokenizer(body)    

for{
    tt := z.Next()

    if tt= html.ErrorToken{
        return
    }
    else if tt== html.StartTagToken{
        tag := z.Token()

        if tag.Data =="h1"{
            fmt.Println("We found the title
")
            //some code to find what is stored in the heading
        }
    }
} 

How do I go about doing that?

EDIT: More specifically, what is the property of variable tag which would give me the text inside of it. I may be wrong with the conceptual terms here. Please bear with me

  • 写回答

1条回答 默认 最新

  • dpu66046 2017-01-05 17:16
    关注

    What you got is the StartTagToken, the part you're intrested in is between it and the corresponding EndTagToken as TextToken. So you need to read the next token and it's Data should be the value you're after, something like

    ...
    if tag.Data =="h1"{
       if tt = z.Next(); tt == html.TextToken {
           fmt.Println(z.Token().Data)
       }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 python点云生成mesh精度不够怎么办
  • ¥15 QT C++ 鼠标键盘通信
  • ¥15 改进Yolov8时添加的注意力模块在task.py里检测不到
  • ¥50 高维数据处理方法求指导
  • ¥100 数字取证课程 关于FAT文件系统的操作
  • ¥15 如何使用js实现打印时每页设置统一的标题
  • ¥15 安装TIA PortalV15.1报错
  • ¥15 能把水桶搬到饮水机的机械设计
  • ¥15 Android Studio中如何把H5逻辑放在Assets 文件夹中以实现将h5代码打包为apk
  • ¥15 使用小程序wx.createWebAudioContext()开发节拍器