dongmo20030416 2014-03-20 19:50
浏览 96
已采纳

Go-从具有已知结构的文档中获取单个特定HTML元素的文本

In a little script I'm writing, I make a POST to a web service and receive an HTML document in response. This document is largely irrelevant to my needs, with the exception of the contents of a single textarea. This textarea is the only textarea in the page and it has a particular name that I know ahead of time. I want to grab that text without worrying about anything else in the document. Currently I'm using regex to get the correct line and then to delete the tags, but I feel like there's probably a better way.

Here's what the document looks like:

<html><body>
<form name="query" action="http://www.example.net/action.php" method="post">
    <textarea type="text" name="nameiknow"/>The text I want</textarea>
    <div id="button">
        <input type="submit" value="Submit" />
    </div>
</form>
</body></html>

And here's how I'm currently getting the text:

s := string(body)

// Gets the line I want
r, _ := regexp.Compile("<textarea.*name=(\"|')nameiknow(\"|').*textarea>")
s = r.FindString(s)

// Deletes the tags
r, _ = regexp.Compile("<[^>]*>")
s = r.ReplaceAllString(s, "")

I think using a full HTML parser might be a bit too much in this case, which is why I went in this direction, though for all I know there's something much better out there.

I appreciate any advice you may have.

  • 写回答

2条回答 默认 最新

  • duandang6111 2014-03-20 21:02
    关注

    Take a look at this package: https://github.com/PuerkitoBio/goquery. It's like jQuery but for Go. It allows you to do things like

    text := doc.Find("strong").Text()
    

    Full working example:

    package main
    
    import (
        "bytes"
        "fmt"
    
        "github.com/PuerkitoBio/goquery"
    )
    
    var s = `<html><body>
    <form name="query" action="http://www.example.net/action.php" method="post">
        <textarea type="text" name="nameiknow">The text I want</textarea>
        <div id="button">
            <input type="submit" value="Submit" />
        </div>
    </form>
    </body></html>`
    
    func main() {
        r := bytes.NewReader([]byte(s))
        doc, _ := goquery.NewDocumentFromReader(r)
        text := doc.Find("textarea").Text()
        fmt.Println(text)
    }
    

    Prints: "The text I want".

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 将二维数组,按照假设的规定,如0/1/0 == "4",把对应列位置写成一个字符并打印输出该字符
  • ¥15 NX MCD仿真与博途通讯不了啥情况
  • ¥15 win11家庭中文版安装docker遇到Hyper-V启用失败解决办法整理
  • ¥15 gradio的web端页面格式不对的问题
  • ¥15 求大家看看Nonce如何配置
  • ¥15 Matlab怎么求解含参的二重积分?
  • ¥15 苹果手机突然连不上wifi了?
  • ¥15 cgictest.cgi文件无法访问
  • ¥20 删除和修改功能无法调用
  • ¥15 kafka topic 所有分副本数修改