douzhuo2002 2018-12-04 03:47
浏览 48
已采纳

在GO中重组大块zip下载

I am downloading a large .zip file in parallel with Accept-Ranges and Goroutines. The application sends multiple requests to download 10MB chunks of a zip file from a URL using its Range header.

The requests get split up into different ranges as separate Goroutines and the data obtained is written into temp files. The files are named 1, 2, 3....

package main

import (
    "bufio"
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "os"
    "strconv"
    "sync"
)

var wg sync.WaitGroup

func main() {
    url := "https://path/to/large/zip/file/zipfile.zip"
    res, _ := http.Head(url)
    maps := res.Header
    length, _ := strconv.Atoi(maps["Content-Length"][0]) // Get the content length from the header request
    chunks := (length / (1024 * 1024 * 10)) + 1

    // startByte and endByte determines the positions of the chunk that should be downloaded
    var startByte = 0
    var endByte = (1024 * 1024 * 10) - 1
    //body := make([][]byte, chunks)
    body := make([]io.ReadCloser, chunks)

    for i := 0; i < chunks; i++ {
        wg.Add(1)

        go func(min int, max int, i int) {
            client := &http.Client {}
            req, _ := http.NewRequest("GET", url, nil)
            rangeHeader := "bytes=" + strconv.Itoa(min) +"-" + strconv.Itoa(max)
            fmt.Println(rangeHeader)
            req.Header.Add("Range", rangeHeader)

            resp,_ := client.Do(req)
            defer resp.Body.Close()

            reader, _ := ioutil.ReadAll(resp.Body)
            body[i] = resp.Body
            ioutil.WriteFile(strconv.Itoa(i), reader, 777) // Write to the file i as a byte array

            wg.Done()
        }(startByte, endByte, i)

        startByte = endByte + 1
        endByte += 1024 * 1024 * 10
    }
    wg.Wait()

    filepath := "zipfile.zip"
    // Create the file
    _, err := os.Create(filepath)
    if err != nil {
        return
    }
    file, _ := os.OpenFile(filepath, os.O_APPEND|os.O_WRONLY, os.ModeAppend)
    if err != nil {
        log.Fatal(err)
    }


    for j := 0; j < chunks; j++ {
        newFileChunk, err := os.Open(strconv.Itoa(j))
        if err != nil {
            log.Fatal(err)
        }
        defer newFileChunk.Close()

        chunkInfo, err := newFileChunk.Stat()
        if err != nil {
            log.Fatal(err)
        }
        var chunkSize int64 = chunkInfo.Size()
        chunkBufferBytes := make([]byte, chunkSize)

        // read into chunkBufferBytes
        reader := bufio.NewReader(newFileChunk)
        _, err = reader.Read(chunkBufferBytes)
        file.Write(chunkBufferBytes)
        file.Sync() //flush to disk
        chunkBufferBytes = nil // reset or empty our buffer
    }

    //Verify file size
    filestats, err := file.Stat()
    if err != nil {
        log.Fatal(err)
        return
    }
    actualFilesize := filestats.Size()
    if actualFilesize != int64(length) {
        log.Fatal("Actual Size: ", actualFilesize, " Expected: ", length)
        return
    }

    file.Close()
}

After all the files are downloaded, I try to recombine them into one .zip file. However, when the files are put together, I can't unzip the final file, as it appears to be corrupted.

I would like to know what I am doing wrong, or if there's a better approach to this. Thanks in advance.

EDIT: Below is what gets logged to the console

bytes=0-10485759
bytes=10485760-20971519
2018/12/04 11:21:28 Actual Size: 16877828 Expected: 16877827
  • 写回答

1条回答 默认 最新

  • douzhan5058 2018-12-04 10:49
    关注

    The problem is with your range requests

    the lines

       resp,_ := client.Do(req)
       defer resp.Body.Close()
    

    are reported by go vet as the err isn't checked. If you check the response code in the last chunk it is 416 - which is an incorrect range used, alter to this

    resp, err := client.Do(req)
    if err != nil {
        panic(err)
    }
    if resp.StatusCode == 416 {
        fmt.Println("incorrect range")
    }
    defer resp.Body.Close()
    

    I also altered the loop variable to for i := 0; i < chunks-1; i++ { and altered the section after the go routine

    startByte = endByte + 1
    endByte += 1024 * 1024 * 10
    if startByte >= length {
        break
    }
    for endByte >= length {
        endByte = endByte - 1
    }
    

    and altered the j loop variable in a similar way

    These changes seemed to work for me but I don't have suitable test data to really check

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮