Using the experimental code.google.com/p/go.net/html
package, we can use ParseFragment
to parse some sub-section of an HTML document.
Like this:
var s = `
<option id="foo">first</option>
<option Class="tester">second</option>
<option>third</option>
`
doc, err := html.ParseFragment(strings.NewReader(s), &html.Node{
Type: html.ElementNode,
Data: "body",
DataAtom: atom.Body,
})
This works fine for most elements. But it doesn't seem to work when certain elements are at the root position of the HTML, like tbody
, tr
, and td
(and perhaps others, not sure). It simply ignores the tags and only gives the text content.
This can be remedied by providing the semantically correct parent instead of atom.Body
, but that requires that we know in advance what the HTML will be.
I'd hoped there was a generic root like atom.DocumentFragment
, but I don't see that. So is there some way to use this in such a manner that it'll work with any arbitrary HTML fragment?