从Go中的html页面提取文本

Looking for a way to simply get the text of a web page, preferably without having to resort to a bunch of regular expressions.

Just thought I'd check first in case this kind of thing is already built in, or at least easier to do in Go.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dttl3933 2014-11-18 09:57
关注
You could use go-query. This lib can be used like jquery to grep text and doc elements from a html document.

This example is taken from the github page:

package main import ( "fmt" "github.com/PuerkitoBio/goquery" "log" ) func ExampleScrape() { doc, err := goquery.NewDocument("http://metalsucks.net") if err != nil { log.Fatal(err) } doc.Find(".reviews-wrap article .review-rhs").Each(func(i int, s *goquery.Selection) { band := s.Find("h3").Text() title := s.Find("i").Text() fmt.Printf("Review %d: %s - %s ", i, band, title) }) } func main() { ExampleScrape() }
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

从Go中的html页面提取文本 html
2014-11-18 00:05

回答 1 已采纳 You could use go-query. This lib can be used like jquery to grep text and doc elements from a html
在Golang中从HTML提取文本内容
2014-01-08 15:48

回答 3 已采纳 Don't use regular expressions to try and interpret HTML. Use a fully capable HTML tokenizer and pa
使用go模板解析显示为纯文本的页面中的HTML代码 html
2018-05-18 09:48

回答 1 已采纳 Using template.HTML html/template provides automatic, context-sensitive escaping safe against cod
golang去除html的标签，转换成纯文本字符串
2022-12-14 21:13

码云笔记的博客【代码】golang去除html的标签，转换成纯文本字符串。
使用golang将html模板存储为DB中的文本字段 html
2018-02-18 20:20

回答 2 已采纳 You are using context.Render method incorrectly. https://github.com/labstack/echo/blob/master/con
如何从go中的XML元素（结构标签）获取文本？ xml
2017-08-21 20:13

回答 1 已采纳 Per the documentation: If the XML element contains character data, that data is accumulat
根据句子结构从字符串中提取文本 php
2015-12-09 22:18

回答 2 已采纳 A simple regex will do: preg_match('/The quote request is from ([^;]+);/', $text, $match); echo
带你了解前端之HTML超文本标记语言
2022-08-22 20:19

LoisMay的博客详细介绍了前端HTML中大致知识点
如何从Golang的地图中提取x top int值？
2018-09-26 09:50

回答 1 已采纳 Creating a slice and sorting is a fine solution; however, you could also use a heap. The Big O per
在Golang中提取* html.Node的位置偏移 html
2016-01-15 13:34

回答 2 已采纳 I come up with solution where we extend (please fix me if there's another way to do it) original H
从golang中传入的https请求中提取通用名称
2019-06-03 21:13

回答 1 已采纳 You can retrieve it from the VerifiedChains member of the request's TLS field: func helloHandler(
django 实现后台从富文本提取纯文本
2020-12-17 11:11

但是如果我们要做一个搜索的功能，去从富文本中查找关键字，就需要将富文本中的文本了。但是 django 并没有专门函数去做。这个时候我们就需要使用正则或者是提取前端的过滤器 striptags 方法。开始：一、用正则 ...
如何从golang的数组中提取字符？
2017-08-21 07:49

回答 2 已采纳 Start by seeding the pseudorandom number generator. For example, package main import ( "fmt"
从HTML文件中提取正文的简单方案
2019-10-06 01:01

a13393665983的博客从HTML文件中提取正文的简单方案 ... http://www.basesnet.com/seo/53从HTML文件中提取正文的简单方案2012-03-07/SEO/HTML文件,提取正文,简单方案/1多种基于html正文提取的思想一、基于统计的中文网页正...
正则表达式提取字符串中的数字 - 前端开发
2023-09-24 18:20

后端工程实操的博客在前端开发中，我们经常需要从字符串中提取数字。正则表达式是一种强大的模式匹配工具，可以用于在文本中查找、匹配和提取特定的模式。需要注意的是，上述代码只提取了第一个匹配到的数字。如果你想要提取所有的数字...
没有解决我的问题, 去提问

悬赏问题

¥15 ikuai客户端多拨vpn，重启总是有个别重拨不上
¥20 关于#anlogic#sdram#的问题，如何解决？(关键词-performance)
¥15 相敏解调 matlab
¥15 求lingo代码和思路
¥15 公交车和无人机协同运输
¥15 stm32代码移植没反应
¥15 matlab基于pde算法图像修复，为什么只能对示例图像有效
¥100 连续两帧图像高速减法
¥15 如何绘制动力学系统的相图
¥15 对接wps接口实现获取元数据

从Go中的html页面提取文本

1条回答 默认 最新

悬赏问题

1条回答默认最新