douchuoliu4422 2017-08-02 23:03
浏览 295
已采纳

在Golang中的多个线程中按块下载文件

I need to download files, chunk by chunk in multiple threads. For example, I have 1k files, each file ~100Mb-1Gb and I can download these files only by chunks 4096Kb(each http get request gives me only 4kb).

It might be to long to download it in one thread, so I want to download them, let's say in 20 threads(one thread for one file) and I also need to download a few chunks in each of these threads, simultaneously.

Is there any example that shows such logic?

  • 写回答

1条回答 默认 最新

  • douxiong2999 2017-08-03 21:15
    关注

    This is an example of how to set up a concurrent downloader. Things to be aware of are bandwidth, memory, and disk space. You can kill your bandwidth by trying to do to much at once, the same goes for memory. Your downloading pretty big files so memory can be an issue. Another thing to note is that by using gorountines you are losing request order. So if the order of the returned bytes matter, then this will not work because you will have to know the byte order to assemble the file in the end, which would mean that a downloading one at a time is best, unless you implement a way to keep track of the order (maybe some kind of global map[order int][]bytes with mutex to prevent race conditions). An alternative that doesn't involve Go (assuming you have a unix machine for ease) is to use Curl see here http://osxdaily.com/2014/02/13/download-with-curl/

    package main
    
    import (
        "bytes"
        "fmt"
        "io"
        "io/ioutil"
        "log"
        "net/http"
        "sync"
    )
    
    // now your going to have to be careful because you can potentially run out of memory downloading to many files at once..
    // however here is an example that can be modded
    func downloader(wg *sync.WaitGroup, sema chan struct{}, fileNum int, URL string) {
        sema <- struct{}{}
        defer func() {
            <-sema
            wg.Done()
        }()
    
        client := &http.Client{Timeout: 10}
        res, err := client.Get(URL)
        if err != nil {
            log.Fatal(err)
        }
        defer res.Body.Close()
        var buf bytes.Buffer
        // I'm copying to a buffer before writing it to file
        // I could also just use IO copy to write it to the file
        // directly and save memory by dumping to the disk directly.
        io.Copy(&buf, res.Body)
        // write the bytes to file
        ioutil.WriteFile(fmt.Sprintf("file%d.txt", fileNum), buf.Bytes(), 0644)
        return
    }
    
    func main() {
        links := []string{
            "url1",
            "url2", // etc...
        }
        var wg sync.WaitGroup
        // limit to four downloads at a time, this is called a semaphore
        limiter := make(chan struct{}, 4)
        for i, link := range links {
            wg.Add(1)
            go downloader(&wg, limiter, i, link)
        }
        wg.Wait()
    
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 微信公众号如何开发网页
  • ¥15 h3.6m 人类行为预测论文复现
  • ¥50 wordpress项目注册报失败刷新后其实是成功状态,请求排查原因
  • ¥20 linxu服务器僵尸进程不释放,代码如何修改?
  • ¥15 pycharm激活不成功
  • ¥40 如果update 一个列名为参数的value
  • ¥15 基于51单片机的水位检测系统设计中LCD1602一直不显示
  • ¥15 OCS2安装出现问题,请大家给点意见
  • ¥15 ros小车启动launch文件报错
  • ¥15 vs2015到期想登陆但是登陆不上