duanjianlu0506 2016-04-08 14:27
浏览 58
已采纳

在Go中将xpath节点转换回html-markup

import (
    "fmt"
    "gopkg.in/xmlpath.v2"
    "log"
)

...

path := xmlpath.MustCompile("//div[@id='23']")
tree, err := xmlpath.ParseHTML(reader)
if err != nil {
    log.Fatal("HTML parsing error, maybe not wellformed", err)
}

iter := path.Iter(tree)
for iter.Next() {
    fmt.Println(iter.Node().String()) // returns only the values of the text-node
}

...

Is there a way to convert iter.Node() back to html markup like <div>...</div>? iter.Node().String() returns only the values of all inner text nodes. As far as I see the documentation of the xmlpath-package does not offer such function.

  • 写回答

2条回答 默认 最新

  • dpda53918 2016-04-08 20:35
    关注

    You are right - gopkg.in/xmlpath.v2 functions are limited to read content of nodes. And there is not many alternatives in Go to work with DOM.

    From native Go libraries I can mention only goquery. It works only with HTML and does not support XPath but support CSS selectors. Maybe that would be enough in your case.

    If you really need to work with both HTML and XML via XPath there is libxml wrapper for Go called gokogiri. It supports all features of libxml so you can get nodes, inner/outerHTML, attributes and other things. I used it to extract text content in one service which currently is in production state. It's a bit faster than PHP's DOMDocument. Only one limitation is fact that I'm not sure if it supports Go versions higher than 1.4.*. Oh and installation on Windows is a bit tricky.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 安装svn网络有问题怎么办
  • ¥15 Python爬取指定微博话题下的内容,保存为txt
  • ¥15 vue2登录调用后端接口如何实现
  • ¥65 永磁型步进电机PID算法
  • ¥15 sqlite 附加(attach database)加密数据库时,返回26是什么原因呢?
  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥15 latex怎么处理论文引理引用参考文献