duankua3620 2015-03-28 15:05
浏览 34
已采纳

使用Go从html解析列表项

I want to extract all list items (content of each <li></li>) with Go. Should I use regexp to get the <li> items or is there any other library for this?

My intention is to get a list or array in Go that contains all list item from a specific html web page. How should I do that?

  • 写回答

2条回答 默认 最新

  • du0923 2015-03-28 15:53
    关注

    You likely want to use the golang.org/x/net/html package. It's not in the Go standard packages, but instead in the Go Sub-repositories. (The sub-repositories are part of the Go Project but outside the main Go tree. They are developed under looser compatibility requirements than the Go core.)

    There is an example in that documentation that may be similar to what you want.

    If you need to stick with the Go standard packages for some reason, then for "typical HTML" you can use encoding/xml.

    Both packages tend to use an io.Reader for input. If you have a string or []byte variable you can wrap them with strings.NewReader or bytes.Buffer to get an io.Reader.

    For HTML it's more likely you'll come from an http.Response body (make sure to close it when done). Perhaps something like:

        resp, err := http.Get(someURL)
        if err != nil {
            return err
        }
        defer resp.Body.Close()
    
        doc, err := html.parse(resp.Body)
        if err != nil {
            return err
        }
        // Recursively visit nodes in the parse tree
        var f func(*html.Node)
        f = func(n *html.Node) {
            if n.Type == html.ElementNode && n.Data == "a" {
                for _, a := range n.Attr {
                    if a.Key == "href" {
                        fmt.Println(a.Val)
                        break
                    }
                }
            }
            for c := n.FirstChild; c != nil; c = c.NextSibling {
                f(c)
            }
        }
        f(doc)
    }
    

    Of course, parsing fetched web pages won't work for pages that modify their own contents with JavaScript on the client side.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 Qt下使用tcp获取数据的详细操作
  • ¥15 idea右下角设置编码是灰色的
  • ¥15 全志H618ROM新增分区
  • ¥20 jupyter保存图像功能的实现
  • ¥15 在grasshopper里DrawViewportWires更改预览后,禁用电池仍然显示
  • ¥15 NAO机器人的录音程序保存问题
  • ¥15 C#读写EXCEL文件,不同编译
  • ¥15 MapReduce结果输出到HBase,一直连接不上MySQL
  • ¥15 扩散模型sd.webui使用时报错“Nonetype”
  • ¥15 stm32流水灯+呼吸灯+外部中断按键