douju2474 2014-08-19 15:01
浏览 19
已采纳

Golang Gokogiri递归xpath异常

I was trying to perform xpath operations on a html document. I wanted to do a two-level xpath query. The html document "index.html" is as follows:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
    <div class="head">
        <div class="area">
            <div class="value">10</div>
        </div>
        <div class="area">
            <div class="value">20</div>
        </div>
        <div class="area">
            <div class="value">30</div>
        </div>
    </div>
</body>
</html>

I wanted to get all divs with class="area" first, then recursively get divs inside it with class="value" in golang using Gokogiri.

My go code is as follows: package main

import (
    "fmt"
    "io/ioutil"

    "github.com/moovweb/gokogiri"
    "github.com/moovweb/gokogiri/xpath"
)

func main() {
    content, _ := ioutil.ReadFile("index.html")

    doc, _ := gokogiri.ParseHtml(content)
    defer doc.Free()

    xps := xpath.Compile("//div[@class='head']/div[@class='area']")
    xpw := xpath.Compile("//div[@class='value']")
    ss, _ := doc.Root().Search(xps)
    for _, s := range ss {
        ww, _ := s.Search(xpw)
        for _, w := range ww {
            fmt.Println(w.InnerHtml())
        }
    }
}

However, the output I get is odd:

10
20
30
10
20
30
10
20
30

I intend to get:

10
20
30

I want to recursively search for xpath patterns. I think there is something wrong with my second level xpath pattern. It appears, my second level xpath is again search in the whole document instead of individual divs with class="area". What do I do for recursive xpath patterns search? I'd appreciate any help.

  • 写回答

2条回答 默认 最新

  • douluolan9101 2014-08-19 15:44
    关注

    An XPath search from any node can still search the entire tree.

    If you want to search just the subtree, you can start the expression with a . (assuming you still want descendant-or-self), otherwise use a exact path.

    xps := xpath.Compile("//div[@class='head']/div[@class='area']")
    xpw := xpath.Compile(".//div[@class='value']")
    
    // this works in your example case
    // xpw := xpath.Compile("div[@class='value']")
    // as does this
    // xpw := xpath.Compile("./div[@class='value']")
    
    ss, _ := doc.Root().Search(xps)
    for _, s := range ss {
        ww, _ := s.Search(xpw)
        for _, w := range ww {
            fmt.Println(w.InnerHtml())
        }
    }
    

    Prints:

    10
    20
    30
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 求一个html代码,有偿
  • ¥100 关于使用MATLAB中copularnd函数的问题
  • ¥20 在虚拟机的pycharm上
  • ¥15 jupyterthemes 设置完毕后没有效果
  • ¥15 matlab图像高斯低通滤波
  • ¥15 针对曲面部件的制孔路径规划,大家有什么思路吗
  • ¥15 钢筋实图交点识别,机器视觉代码
  • ¥15 如何在Linux系统中,但是在window系统上idea里面可以正常运行?(相关搜索:jar包)
  • ¥50 400g qsfp 光模块iphy方案
  • ¥15 两块ADC0804用proteus仿真时,出现异常