In a little script I'm writing, I make a POST to a web service and receive an HTML document in response. This document is largely irrelevant to my needs, with the exception of the contents of a single textarea
. This textarea
is the only textarea
in the page and it has a particular name
that I know ahead of time. I want to grab that text without worrying about anything else in the document. Currently I'm using regex to get the correct line and then to delete the tags, but I feel like there's probably a better way.
Here's what the document looks like:
<html><body>
<form name="query" action="http://www.example.net/action.php" method="post">
<textarea type="text" name="nameiknow"/>The text I want</textarea>
<div id="button">
<input type="submit" value="Submit" />
</div>
</form>
</body></html>
And here's how I'm currently getting the text:
s := string(body)
// Gets the line I want
r, _ := regexp.Compile("<textarea.*name=(\"|')nameiknow(\"|').*textarea>")
s = r.FindString(s)
// Deletes the tags
r, _ = regexp.Compile("<[^>]*>")
s = r.ReplaceAllString(s, "")
I think using a full HTML parser might be a bit too much in this case, which is why I went in this direction, though for all I know there's something much better out there.
I appreciate any advice you may have.