从net / html令牌生成器获取流中的当前位置

I'm trying to figure out if there's a way to get the current character position of a tag using the golang.org/x/net/html tokenizer library?

Simplified code looks like:

func LookForForm(body string) {
    reader := strings.NewReader(body)
    tokenizer := html.NewTokenizer(reader)
    idx := 0
    lastIdx := 0
    for {
        token := tokenizer.Next()
        lastIdx = idx
        idx = int(reader.Size()) - int(reader.Len())
        switch token {
        case html.ErrorToken:
            return
        case html.StartTagToken:
            t := tokenizer.Token()
            tagName := strings.ToLower(t.Data)
            if tagName == "form" {
                fmt.Printf("found at form at %d
", lastIdx)
                return
            }
        }
    }
}

This doesn't work (I think) because reader is not reading character-by-character but by chunks so my calculation of Size - Len is invalid. tokenizer maintains two private span structs ( https://github.com/golang/net/blob/master/html/token.go line 147) but I am unaware of how to access them.

One possible solution that just occurred to me is to make a "reader" that only reads a single character at a time so my Size and Len calculations are always correct. But, that seems like a hack and any suggestions would be appreciated.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

dounuo9921 2017-11-17 20:34

关注

A non-buffering reader ended up working ok for me. The implementation of the reader looks something like:

package rule

import (
    "errors"
    "io"
    "unicode/utf8"
)

type Reader struct {
    s        string
    i        int64
    z        int64
    prevRune int64 // index of the previously read rune or -1
}

func (r *Reader) String() string {
    return r.s
}

func (r *Reader) Len() int {
    if r.i >= r.z {
        return 0
    }
    return int(r.z - r.i)
}


func (r *Reader) Size() int64 {
    return r.z 
}


func (r *Reader) Pos() int64 {
    return r.i
}


func (r *Reader) Read(b []byte) (int, error) {
    if r.i >= r.z {
        return 0, io.EOF
     }

    r.prevRune = -1
    b[0] = r.s[r.i]
    r.i += 1
    return 1, nil
}

Then the loop for the tokenizer is fairly easy to calculate:

    reader := NewReader(body)
    tokenizer := html.NewTokenizer(reader)
    idx := 0
    lastIdx := 0
tokenLoop:
    for {
        token := tokenizer.Next()
        switch token {
        case html.ErrorToken:
            break tokenLoop
        case html.EndTagToken, html.TextToken, html.CommentToken, html.SelfClosingTagToken:
            lastIdx = int(reader.Pos())
        case html.StartTagToken:
            t := tokenizer.Token()
            tagName := strings.ToLower(t.Data)
            idx = int(reader.Pos())
            if tagName == "form" {
                fmt.Printf("found at form at %d
", lastIdx)
                return
            }
        }
    }

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

从net / html令牌生成器获取流中的当前位置
2017-11-16 21:30

回答 2 已采纳 A non-buffering reader ended up working ok for me. The implementation of the reader looks somethin
Google API：每次运行时获取刷新/访问令牌 php
2017-06-19 19:00

回答 1 已采纳 When you are saving access token, save refresh token too (in authorization code exchange) then, u
如何在Laravel上的app / Mail VeryfyMail.php上生成随机令牌？ php
2019-07-05 13:56

回答 1 已采纳 Since you said you want he random token as the subject, just do: Getting a random string (hex) W
令牌桶算法限流
2019-01-24 14:45

王卫东的博客常用的限流算法有令牌桶和和漏桶，而Google开源项目Guava中的RateLimiter使用的就是令牌桶控制算法。在开发高并发系统时有三把利器用来保护系统：缓存、降级和限流缓存：缓存的目的是提升系统访问速度和增大系统...
使用承载令牌在Twitter API中获取用户信息 php twitter
2018-10-26 06:24

回答 2 已采纳 Ah, I was so lazy to read whole documentation. Here is everything about App-only authentication. A
使用PHP验证HTML中的ID / NAME令牌？ php
2011-04-02 15:11

回答 2 已采纳 I suppose a regular expression such as this one could do the trick : ^[A-Za-z][A-Za-z0-9_:\.-]*
在GoLang中生成CosmosDB身份验证令牌
2017-10-07 21:04

回答 1 已采纳 This took me longer to find than I would have hoped. There is one obvious problem, you are not ba
限流——令牌桶算法（与漏桶的区别）
2019-10-17 11:50

HD243608836的博客转载自，原文格式清晰：...常用的限流算法有令牌桶和和漏桶，而Google开源项目Guava中的RateLimiter使用的就是令牌桶控制算法。在开发高并发系统时有三把利器用来保护系统：缓存...
从Golang REST API生成Amazon S3令牌
2017-05-05 18:22

回答 1 已采纳 To generate tokens for users to upload files directly to s3, you can use pre-signed URLS. After g
Jwt生成Token令牌相关问题 java
2021-10-28 18:19

回答 2 已采纳不是IP绑定，现在流行的都是机器码绑定
Google OAuth2验证服务器流中的访问令牌 php
2016-12-18 20:55

回答 1 已采纳 As a result of authorization success from Google, you should also get refresh_token. If you are no
令牌桶限流总结
2020-07-31 16:57

Rookie1012的博客令牌桶限流总结一、引入二、令牌桶和漏桶算法区别三、Guava中RateLimiter用法及源码分析1、Google的令牌桶RateLimiter用法2、RateLimiter源码简单分析：一、引入限流是对某一时间窗口内的请求数进行限制，保证...
如何在PHP中检索SAML2令牌以从WSO2 APIM获取OAuth令牌 php
2016-11-21 15:43

回答 2 已采纳 I have succeeded, so here are some indications for others who may have the same issue. I hacked
api接口限流方案——（漏桶与令牌桶）
2023-02-16 23:39

Generalzy的博客 redis操作时单线程的，平常如果想要redis原子性操作的话，可以使用incrBy()和decrBy()方法进行原子性的加减，但是对于事务性的逻辑操作，没有办法实现原子性，Redis 使用单个 Lua 解释器去运行所有脚本，当某个脚本...
七微服务网关gateWay和Jwt令牌
2020-09-28 23:42

麦芽糖0219的博客微服务网关概念详解 gateway跨域配置 ...网关限流、令牌桶、将令牌信息添加到请求头中 JWT讲解、JWT的构成、JJWT的介绍和使用、创建TOKEN、TOKEN解析、token过期设置自定义claims载荷信息、gateway鉴权处理
没有解决我的问题, 去提问

悬赏问题

¥15 c程序不知道为什么得不到结果
¥40 复杂的限制性的商函数处理
¥15 程序不包含适用于入口点的静态Main方法
¥15 素材场景中光线烘焙后灯光失效
¥15 请教一下各位，为什么我这个没有实现模拟点击
¥15 执行 virtuoso 命令后，界面没有，cadence 启动不起来
¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置

码龄粉丝数原力等级 --

从net / html令牌生成器获取流中的当前位置

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

从net / html令牌生成器获取流中的当前位置

2条回答 默认 最新

悬赏问题

2条回答默认最新