dth8312 2013-09-03 03:45
浏览 9
已采纳

等同于Go中Python的HTML解析功能/模块?

I'm now learning Go myself and am stuck in getting and parsing HTML/XML. In Python, I usually write the following code when I do web scraping:

from urllib.request import urlopen, Request
url = "http://stackoverflow.com/"
req = Request(url)
html = urlopen(req).read()

, then I can get raw HTML/XML in a form of either string or bytes and proceed to work with it. In Go, how can I cope with it? What I hope to get is raw HTML data which is stored either in string or []byte (though it can be easily converted, that I don't mind which to get at all). I consider using gokogiri package to do web scraping in Go (not sure I'll indeed end up with using it!), but it looks like it requires raw HTML text before doing any work with it...

So how can I acquire such object?

Or is there any better way to do web scraping work in Go?

Thanks.

  • 写回答

1条回答 默认 最新

  • drvjlec1767 2013-09-03 03:50
    关注

    From the Go http.Get Example:

    package main
    
    import (
        "fmt"
        "io/ioutil"
        "log"
        "net/http"
    )
    
    func main() {
        res, err := http.Get("http://www.google.com/robots.txt")
        if err != nil {
            log.Fatal(err)
        }
        robots, err := ioutil.ReadAll(res.Body)
        res.Body.Close()
        if err != nil {
            log.Fatal(err)
        }
        fmt.Printf("%s", robots)
    }
    

    Will return the contents of http://www.google.com/robots.txt into the string variable robots.

    For XML parsing look into the Go encoding/xml package.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 前端echarts坐标轴问题
  • ¥15 CMFCPropertyPage
  • ¥15 ad5933的I2C
  • ¥15 请问RTX4060的笔记本电脑可以训练yolov5模型吗?
  • ¥15 数学建模求思路及代码
  • ¥50 silvaco GaN HEMT有栅极场板的击穿电压仿真问题
  • ¥15 谁会P4语言啊,我想请教一下
  • ¥15 这个怎么改成直流激励源给加热电阻提供5a电流呀
  • ¥50 求解vmware的网络模式问题 别拿AI回答
  • ¥24 EFS加密后,在同一台电脑解密出错,证书界面找不到对应指纹的证书,未备份证书,求在原电脑解密的方法,可行即采纳