dqnhfbc3738 2016-07-19 00:03
浏览 66
已采纳

使用Golang中的curl在网站上对字符串进行grep操作最有效和可扩展的方法是什么?

Background

user@host curl -s http://stackoverflow.com | grep -m 1 stackoverflow.com

returns immediately if the string is found:

<meta name="twitter:domain" content="stackoverflow.com"/>

Aim

find a string on a website using Golang

Method

Based on sources from Go by Example and Schier's Blog the following code was created:

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "regexp"
)

func main() {
    url := "http://stackoverflow.com"
    resp, _ := http.Get(url)
    bytes, _ := ioutil.ReadAll(resp.Body)
    r, _ := regexp.Compile("stackoverflow.com")
    fmt.Println(r.FindString(string(bytes)))
    resp.Body.Close()
}

Results

Running the code results in:

stackoverflow.com

Discussion & Conclusions

  1. More code is required to achieve the same aim in Golang or is there a shorter solution
  2. Both options seems to return at the same time. Is static code in this case faster than dynamic code as well?
  3. I am concerned whether this code consumes too much memory. It should be used eventually to monitor hundreds of different websites
  • 写回答

1条回答 默认 最新

  • dongqiangse6623 2016-07-19 04:19
    关注

    This code implements grep, stopping at the first line that contains the given string. It avoids reading the entire webpage into memory at once by using a bufio.Scanner, which apart from bounding the memory use might also speed up the program in the case where the string is found near the start of a huge page. It's careful to use scan.Bytes() to avoid converting every line into a string, which would cause significant memory churn.

    package main
    
    import (
        "bufio"
        "bytes"
        "fmt"
        "log"
        "net/http"
    )
    
    func main() {
        resp, err := http.Get("http://stackoverflow.com")
        if err != nil {
            log.Fatalf("failed to open url")
        }
        scan := bufio.NewScanner(resp.Body)
        toFind := []byte("stackoverflow.com")
        defer resp.Body.Close()
        for scan.Scan() {
            if bytes.Contains(scan.Bytes(), toFind) {
                fmt.Println(scan.Text())
                return
            }
        }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 R语言爬虫的时候元素和园代码不一样怎么解决呀
  • ¥15 VS2022多项目启动有问题
  • ¥15 SQL删除添加数据后序号不连续问题。
  • ¥15 首次运行OmniEvent运行报错
  • ¥15 有没有人知道这个问题怎么解决
  • ¥15 comsol电力电缆载流量仿真
  • ¥15 webSocket可以接TCP socket接口吗
  • ¥60 mpi并行出错,CFD++计算
  • ¥15 c#:vsto,powerpoint的外接程序中换主题颜色
  • ¥15 状态机/汽车转向灯/Sateflow