最佳方法，每秒可完成1000个http.Get

I am currently hitting an api to gather data for my own processing and what not. Currently I am doing 100 http.Get per second and am wondering what the best methodology is to do around 1000 concurrent http.Gets per second.

Here is what I have right now:

waitTime := time.Second
var lastID uint64 = 1234567890
for {
    for i := 0; i < 100; i++ {
        var tmpID uint64 = lastID
        lastID++
        go func(ID uint64) {
            err = scrape(ID) // this does the http.Get and saves the
                             // resulting json into postgresql
            if err != nil {
                errStr := strings.TrimSpace(err.Error())
                if strings.HasSuffix(errStr, "Too Many request to server") {
                    log.Println("hit a real 429")
                    panic(err)
                }
            }
        }(tmpID)
    }
    time.Sleep(waitTime - time.Now().Sub(now)) // this is here to             
                                  // ensure I dont go over the limit
}

The api I am hitting is rate limited to 1000 req/s.

The reason for my go func(ID) is so I can just incrementally increase my ID without having to worry about using a lock for access "what the next ID is". I just feel like I am doing this wrong. I am pretty new to go in general as well.

I also assume I have to raise my ulimit on my ubuntu server to something over 1000 as well to handle all these open connections.

any tips or suggestions are greatly appreciated!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

douhuang5331 2016-06-03 01:15

关注

Does your http client cache the connections? Default one does.

By default, Transport caches connections for future re-use. This may leave many open connections when accessing many hosts. This behavior can be managed using Transport's CloseIdleConnections method and the MaxIdleConnsPerHost and DisableKeepAlives fields.

Why do you spawn goroutines in a loop instead of spawn some gouroutines with loop inside, if you hit the limit it could backoff for a bit.

Primitive example (I did not test it. May contain typos).

numWorkers := 1000
var delay time.Duration = 0.01 //10 ms (iirc) =)
var maxDelay time.Duration = 0.1 //100 ms (i guess)
quit := make(chan struct{})

for i := 0; i < numWorkers ; i++ {
    go func(ID, shift uint){
       var iter := 0
       var curDelay time.Duration = delay

       for {              
          select {
          case <-quit:
              return

          default:              
              //0th worker: lastID + 0 + 0, lastID + 100 + 0, lastID + 200 + 0, ...
              //1st worker: lastID + 0 + 1, lastID + 100 + 1, lastID + 200 + 2, ...
              //...
              //99th worker: lastID + 0 + 99, lastID + 100 + 99, lastID + 100 + 299, ...
              curID := ID + iter * numWorkers + shift
              err = scrape(curID) // this does the http.Get and saves the
                                // resulting json into postgresql
              if err != nil {
                  errStr := strings.TrimSpace(err.Error())
                  if strings.HasSuffix(errStr, "Too Many request to server") {                              log.Println("hit a real 429")
                      if curDelay > maxDelay {
                         return //or panic, whatever you want
                      }
                      time.Sleep(curDelay)
                      curdelay = curdelay * 2 //exponential delay: 10ms, 20ms, 40ms, 80ms, return/panic
                      continue //no increment on iter
                  }
              }    
              //increment on success
              iter++
              time.Sleep(1) // 1000 workers, each make request and sleep for 1 sec, sounds like 1000 rpm
          }
       }
    }(lastID, i)
}

IDs never overlap, but there will be holes, probably. But you cant avoid it without syncronization (mutex is fine) and, probably, you can do it on 1000rpm, but performance will suffer on bigger number of workers.

close(quit) when you want to stop.

报告相同问题？

关注问题

nodejs的app.get()方法 node.js
2017-05-03 03:05

回答 1 已采纳错误在：Error: .get() requires callback functions but got a [object Undefined] app.get() 方法的第一个参数表示url
Python中Entry内容用.get()方法获取失败，请问如何处理 python 有问必答
2021-07-09 20:57

回答 2 已采纳 print("用户名是：%s\n 密码是：%s" % (u1.get(),p1.get())) 因为你这个的代码tk.Entry(root,textvariable=u1)创建输入框后直接调用.
request.POST.get()的返回值一直是none django python 有问必答
2021-08-20 17:05

回答 2 已采纳 <p>著作名：<input type="text"，name="title" class="form-control" ></p>
卷积神经网络超详细介绍
2018-09-19 10:16

呆呆的猫的博客海量的有标记的训练数据，也就是李飞飞团队提供的大规模有标记的数据集ImageNet计算机硬件的支持，尤其是GPU的出现，为复杂的计算提供了强大的支持算法的改进，包括网络结构加深、数据增强（数据扩充）、ReLU、...
C++中cin.get的一个问题 c++
2017-02-07 13:08

回答 3 已采纳 cin >> ratings[i]; 这句代码让你输入后，你会敲一个回车表示输入完毕。此时在输入流中会有一个换行符，而cin.get()就是用来读取这个换行符的。如果你只注释掉循环
在golang中使用http.Client.Get方法和Body.Read（）方法处理时，EOF错误导致终止 http
2016-06-18 16:33

回答 1 已采纳 Handing io.EOF is part of the contract of using an io.Reader, and it indicates that there's nothin
如何在http.Get请求中使用双引号和冒号作为查询字符串？
2019-08-05 21:48

回答 1 已采纳 Given that server does not handle percent encoding, use basic string operations to construct the U
计算机网络（自顶向下方法）读书笔记----吐血整理
2021-08-03 00:36

-出发-的博客文章目录第 1 章计算机网络和因特网1.1 什么是因特网1.1.1 组成描述描述1.1.2 服务描述1.1.3 协议1.2 网络的边缘1.2.1 接入网1.2.2 物理媒体1.3 网络核心1.3.1 分组交换1.3.2 电路交换1.3.3 分组交换和电路交换的...
Python的requests.get()获取不到正确的网页源码 python 有问必答
2021-06-21 14:46

回答 2 已采纳需要添加headers。 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36
学习python中遇到的问题：TypeError: WebDriver.get() missing 1 required positional argument: 'url' python selenium
2022-09-27 18:40

回答 1 已采纳 browser = webdriver.Edge改为browser = webdriver.Edge()
Nodejs 使用http.get获取网页是为什么网页数据切割成很多部分 node.js
2016-04-23 13:12

回答 1 已采纳这是http协议的做法数据是一块块返回的客户端再组合
《计算机网络--自顶向下方法》第二章--应用层
2022-11-18 17:36

RXY24601的博客计算机网络自定向下方法--应用层。包括应用层概况，Web与HTTP，SMTP，DNS，P2P，视频流，CDN，套接字编程
map.get() 到的引用变量能不能直接修改？ java
2019-03-26 11:59

回答 1 已采纳看看这个：https://www.cnblogs.com/liuyh17211/p/3250365.html Arrays.asList返回的List与new ArrayList生成的List是
网络/Network - 应用层 - 浏览器中的网络 - HTTP/x - 学习/实践
2022-02-28 13:45

宁小法的博客主要用于学习，浏览器中网络协议【应用层传输协议HTTP/x 各个版本】的历史发展，使用场景，以及本质。
软考中级网络工程师全面学习笔记第2版(5万字)+配套视频及课件
2022-08-04 22:04

小猿网的博客 1、文件包括网工第五版软考...对于相应的笔记文档的话，这里笔记只有一个文件且体积小也可以编辑，如果是用着WPS云文档也可上传到自己的云文档中，排版的也还可以，即便是打印出来也可以很好的进行阅读与查看 .........
没有解决我的问题, 去提问

悬赏问题

¥15 opencv图像处理，需要四个处理结果图
¥15 无线移动边缘计算系统中的系统模型
¥15 深度学习中的画图问题
¥15 java报错:使用mybatis plus查询一个只返回一条数据的sql，却报错返回了1000多条
¥15 Python报错怎么解决
¥15 simulink如何调用DLL文件
¥15 关于用pyqt6的项目开发该怎么把前段后端和业务层分离
¥30 线性代数的问题，我真的忘了线代的知识了
¥15 有谁能够把华为matebook e 高通骁龙850刷成安卓系统，或者安装安卓系统
¥188 需要修改一个工具，懂得汇编的人来。

码龄粉丝数原力等级 --

最佳方法，每秒可完成1000个http.Get

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

最佳方法，每秒可完成1000个http.Get

1条回答 默认 最新

悬赏问题

1条回答默认最新