dtfbj24048 2014-09-30 23:45
浏览 131
已采纳

来自html.NewTokenizer.Token()的意外HTML令牌

I am trying to list all the tokens found in a web page. The core is in the function

func find_links(httpBody io.Reader) []string {

    links := make([]string, 0)
    page := html.NewTokenizer(httpBody)
    for {
        tokenType := page.Next()
        if tokenType == html.ErrorToken {
            return links
        }
        token := page.Token()
        fmt.Println("Now token is ", token)
    }
}

When I print the output I obtain something like

Now token is  <body>
Now token is

Now token is  <header>

I don't understand what the second token is and why it is printing an extra blank line.

Full code of a working example here, even if it can't run on playground because of the missing http package

  • 写回答

1条回答 默认 最新

  • doushuhuai7247 2014-09-30 23:49
    关注

    The second token is a TextToken containing a newline.

    Change the print to

       fmt.Printf("Now token is %T %v
    ", token, token)
    

    to see the types of the tokens.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥35 平滑拟合曲线该如何生成
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集