duansengcha9114 2016-04-07 05:10
浏览 45
已采纳

net / html解析文档,无论如何都返回nil * html.Node

I'm trying to process an html document. Thing is that golang.org/x/net/html's Parse returns a *html.Node with nil value, err is also nil, which is kind of strange because if things aren't processed by Parse correctly, I should get an error!

This is my code:

package main

import (
    "bytes"
    "golang.org/x/net/html"
    "io/ioutil"
    "log"
)

func main() {
    html, err := ioutil.ReadFile("html/simple_01.html")
    if e != nil {
        fmt.Fatal(e)
    }
    doc, err := html.Parse(bytes.NewReader(html))
    if err != nil {
        log.Fatal(err)
    }
    // locate <body>
    var body *html.Node
    for s := doc.NextSibling; s != nil; s = s.NextSibling {
        if s.Data == "body" {
            body = s
            break
        }
    }
    log.Println(body)
}

log.Println(body) prints nil. Also printing doc prints nil, which is weird.

Here is the HTML document I'm testing against

<!DOCTYPE html>
<html>

<head>
    <meta charset='utf-8'>
    <title>Sample page - 01</title>
</head>

<body>
    <p>Aspernatur vel molestiae eius sed sunt doloremque. Ipsa sed voluptate expedita tempore id. Ab nobis delectus magnam.</p>
    <p>Beatae id mollitia nesciunt nesciunt qui explicabo cum. Aspernatur est molestiae laudantium assumenda consequuntur. Odit mollitia non inventore iusto. Id nihil voluptatem vitae. Fugit odio dolores atque sed.</p>
    <p>Qui dolorem ipsum fugit vitae consequuntur suscipit debitis iste. Dignissimos impedit nobis quas facilis. Quia dignissimos perspiciatis quia debitis. Rerum beatae repellat architecto nostrum nulla facere rerum.</p>
    <p>Quas natus ad qui excepturi dolorem. Quas dolorum dolores voluptatem distinctio quisquam culpa et. Ipsam voluptatem suscipit earum reprehenderit. Quos laudantium occaecati quis similique. Numquam rerum sunt rerum et necessitatibus. Laboriosam modi iure praesentium voluptates atque adipisci et.</p>
    <p>Blanditiis dolores nemo quos voluptatem quo quia modi. Quia et alias nesciunt sint voluptatum omnis. Nihil minima ipsa magnam qui amet ea. Blanditiis laborum nihil tempora aliquam.</p>
    <p>Ullam molestiae omnis magni ratione exercitationem minima. Sed sequi fugiat laborum omnis voluptas. Debitis sit expedita optio et at qui.</p>
    <p>Fuga iusto quo eum sequi eum sint pariatur ipsam. Alias nisi maiores illum est ab culpa voluptas quidem. Veritatis eum qui deserunt aspernatur quo officia et ipsam.</p>
    <p>Aliquam id autem earum autem eaque. Dolores veniam animi voluptatem. Et est nam culpa consequatur et ex distinctio. Quis iure sequi maiores quibusdam vel nostrum architecto et. Quisquam unde qui pariatur doloremque rerum.</p>
    <p>Dicta est est fugit et architecto. Quia culpa vel error deleniti. Voluptatem fuga omnis eius ea et voluptatum dolor.</p>
    <p>Eaque esse sint voluptatem praesentium ut sit. Fugiat ratione enim doloremque dolor asperiores. Tempora eveniet et aut.</p>
</body>

</html>

What am I doing wrong?

  • 写回答

1条回答 默认 最新

  • dongxu2398 2016-04-07 06:02
    关注

    There are several typos in your code example, but the main problem is, that you are trying to get the next sibling of the root node. You first need to get to the html tag and from there you go down to the first child and then loop through its siblings:

    package main
    
    import (
        "bytes"
        "golang.org/x/net/html"
        "io/ioutil"
        "log"
    )
    
    func main() {
        htmlfile, err := ioutil.ReadFile("html/simple_01.html")
        if err != nil {
            log.Fatal(err)
        }
    
        doc, err := html.Parse(bytes.NewReader(htmlfile))
        if err != nil {
            log.Fatal(err)
        }
    
        var htmlTag = doc.FirstChild.NextSibling
        var body *html.Node
        for s := htmlTag.FirstChild; s != nil; s = s.NextSibling {
            if s.Data == "body" {
                body = s
                break
            }
        }
        log.Println(body)
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么