douchuoliu4422 2017-08-02 23:03
浏览 295
已采纳

在Golang中的多个线程中按块下载文件

I need to download files, chunk by chunk in multiple threads. For example, I have 1k files, each file ~100Mb-1Gb and I can download these files only by chunks 4096Kb(each http get request gives me only 4kb).

It might be to long to download it in one thread, so I want to download them, let's say in 20 threads(one thread for one file) and I also need to download a few chunks in each of these threads, simultaneously.

Is there any example that shows such logic?

  • 写回答

1条回答 默认 最新

  • douxiong2999 2017-08-03 21:15
    关注

    This is an example of how to set up a concurrent downloader. Things to be aware of are bandwidth, memory, and disk space. You can kill your bandwidth by trying to do to much at once, the same goes for memory. Your downloading pretty big files so memory can be an issue. Another thing to note is that by using gorountines you are losing request order. So if the order of the returned bytes matter, then this will not work because you will have to know the byte order to assemble the file in the end, which would mean that a downloading one at a time is best, unless you implement a way to keep track of the order (maybe some kind of global map[order int][]bytes with mutex to prevent race conditions). An alternative that doesn't involve Go (assuming you have a unix machine for ease) is to use Curl see here http://osxdaily.com/2014/02/13/download-with-curl/

    package main
    
    import (
        "bytes"
        "fmt"
        "io"
        "io/ioutil"
        "log"
        "net/http"
        "sync"
    )
    
    // now your going to have to be careful because you can potentially run out of memory downloading to many files at once..
    // however here is an example that can be modded
    func downloader(wg *sync.WaitGroup, sema chan struct{}, fileNum int, URL string) {
        sema <- struct{}{}
        defer func() {
            <-sema
            wg.Done()
        }()
    
        client := &http.Client{Timeout: 10}
        res, err := client.Get(URL)
        if err != nil {
            log.Fatal(err)
        }
        defer res.Body.Close()
        var buf bytes.Buffer
        // I'm copying to a buffer before writing it to file
        // I could also just use IO copy to write it to the file
        // directly and save memory by dumping to the disk directly.
        io.Copy(&buf, res.Body)
        // write the bytes to file
        ioutil.WriteFile(fmt.Sprintf("file%d.txt", fileNum), buf.Bytes(), 0644)
        return
    }
    
    func main() {
        links := []string{
            "url1",
            "url2", // etc...
        }
        var wg sync.WaitGroup
        // limit to four downloads at a time, this is called a semaphore
        limiter := make(chan struct{}, 4)
        for i, link := range links {
            wg.Add(1)
            go downloader(&wg, limiter, i, link)
        }
        wg.Wait()
    
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 metadata提取的PDF元数据,如何转换为一个Excel
  • ¥15 关于arduino编程toCharArray()函数的使用
  • ¥100 vc++混合CEF采用CLR方式编译报错
  • ¥15 coze 的插件输入飞书多维表格 app_token 后一直显示错误,如何解决?
  • ¥15 vite+vue3+plyr播放本地public文件夹下视频无法加载
  • ¥15 c#逐行读取txt文本,但是每一行里面数据之间空格数量不同
  • ¥50 如何openEuler 22.03上安装配置drbd
  • ¥20 ING91680C BLE5.3 芯片怎么实现串口收发数据
  • ¥15 无线连接树莓派,无法执行update,如何解决?(相关搜索:软件下载)
  • ¥15 Windows11, backspace, enter, space键失灵