以通用方式使用html.ParseFragment

Using the experimental code.google.com/p/go.net/html package, we can use ParseFragment to parse some sub-section of an HTML document.

Like this:

var s = `
    <option id="foo">first</option>
    <option Class="tester">second</option>
    <option>third</option>
`
doc, err := html.ParseFragment(strings.NewReader(s), &html.Node{
    Type: html.ElementNode,
    Data: "body",
    DataAtom: atom.Body,
})

This works fine for most elements. But it doesn't seem to work when certain elements are at the root position of the HTML, like tbody, tr, and td (and perhaps others, not sure). It simply ignores the tags and only gives the text content.

This can be remedied by providing the semantically correct parent instead of atom.Body, but that requires that we know in advance what the HTML will be.

I'd hoped there was a generic root like atom.DocumentFragment, but I don't see that. So is there some way to use this in such a manner that it'll work with any arbitrary HTML fragment?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongyuandou2521 2014-01-30 16:34
关注
ParseFragment is always context-sensitive because it follows the HTML5 fragment-parsing algorithm. That algorithm is designed for implementing the DOM innerHTML property, and the correct tree to generate from a given innerHTML string depends on the surrounding context (especially whether the context is in a table or not).

So the html package has no way to parse an HTML fragment independently of its context.

If you need more information about how the parsing depends on the context, see http://www.whatwg.org/specs/web-apps/current-work/multipage/parsing.html#reset-the-insertion-mode-appropriately

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

悬赏问题

¥15 基于卷积神经网络的声纹识别
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 stm32开发clion时遇到的编译问题

以通用方式使用html.ParseFragment

1条回答 默认 最新

悬赏问题

1条回答默认最新