drtwqc3744 2016-06-02 17:31
浏览 45

最佳方法,每秒可完成1000个http.Get

I am currently hitting an api to gather data for my own processing and what not. Currently I am doing 100 http.Get per second and am wondering what the best methodology is to do around 1000 concurrent http.Gets per second.

Here is what I have right now:

waitTime := time.Second
var lastID uint64 = 1234567890
for {
    for i := 0; i < 100; i++ {
        var tmpID uint64 = lastID
        lastID++
        go func(ID uint64) {
            err = scrape(ID) // this does the http.Get and saves the
                             // resulting json into postgresql
            if err != nil {
                errStr := strings.TrimSpace(err.Error())
                if strings.HasSuffix(errStr, "Too Many request to server") {
                    log.Println("hit a real 429")
                    panic(err)
                }
            }
        }(tmpID)
    }
    time.Sleep(waitTime - time.Now().Sub(now)) // this is here to             
                                  // ensure I dont go over the limit
}

The api I am hitting is rate limited to 1000 req/s.

The reason for my go func(ID) is so I can just incrementally increase my ID without having to worry about using a lock for access "what the next ID is". I just feel like I am doing this wrong. I am pretty new to go in general as well.

I also assume I have to raise my ulimit on my ubuntu server to something over 1000 as well to handle all these open connections.

any tips or suggestions are greatly appreciated!

  • 写回答

1条回答 默认 最新

  • douhuang5331 2016-06-03 01:15
    关注

    Does your http client cache the connections? Default one does.

    By default, Transport caches connections for future re-use. This may leave many open connections when accessing many hosts. This behavior can be managed using Transport's CloseIdleConnections method and the MaxIdleConnsPerHost and DisableKeepAlives fields.

    Why do you spawn goroutines in a loop instead of spawn some gouroutines with loop inside, if you hit the limit it could backoff for a bit.

    Primitive example (I did not test it. May contain typos).

    numWorkers := 1000
    var delay time.Duration = 0.01 //10 ms (iirc) =)
    var maxDelay time.Duration = 0.1 //100 ms (i guess)
    quit := make(chan struct{})
    
    for i := 0; i < numWorkers ; i++ {
        go func(ID, shift uint){
           var iter := 0
           var curDelay time.Duration = delay
    
           for {              
              select {
              case <-quit:
                  return
    
              default:              
                  //0th worker: lastID + 0 + 0, lastID + 100 + 0, lastID + 200 + 0, ...
                  //1st worker: lastID + 0 + 1, lastID + 100 + 1, lastID + 200 + 2, ...
                  //...
                  //99th worker: lastID + 0 + 99, lastID + 100 + 99, lastID + 100 + 299, ...
                  curID := ID + iter * numWorkers + shift
                  err = scrape(curID) // this does the http.Get and saves the
                                    // resulting json into postgresql
                  if err != nil {
                      errStr := strings.TrimSpace(err.Error())
                      if strings.HasSuffix(errStr, "Too Many request to server") {                              log.Println("hit a real 429")
                          if curDelay > maxDelay {
                             return //or panic, whatever you want
                          }
                          time.Sleep(curDelay)
                          curdelay = curdelay * 2 //exponential delay: 10ms, 20ms, 40ms, 80ms, return/panic
                          continue //no increment on iter
                      }
                  }    
                  //increment on success
                  iter++
                  time.Sleep(1) // 1000 workers, each make request and sleep for 1 sec, sounds like 1000 rpm
              }
           }
        }(lastID, i)
    }
    

    IDs never overlap, but there will be holes, probably. But you cant avoid it without syncronization (mutex is fine) and, probably, you can do it on 1000rpm, but performance will suffer on bigger number of workers.

    close(quit) when you want to stop.

    评论

报告相同问题?

悬赏问题

  • ¥15 opencv图像处理,需要四个处理结果图
  • ¥15 无线移动边缘计算系统中的系统模型
  • ¥15 深度学习中的画图问题
  • ¥15 java报错:使用mybatis plus查询一个只返回一条数据的sql,却报错返回了1000多条
  • ¥15 Python报错怎么解决
  • ¥15 simulink如何调用DLL文件
  • ¥15 关于用pyqt6的项目开发该怎么把前段后端和业务层分离
  • ¥30 线性代数的问题,我真的忘了线代的知识了
  • ¥15 有谁能够把华为matebook e 高通骁龙850刷成安卓系统,或者安装安卓系统
  • ¥188 需要修改一个工具,懂得汇编的人来。