dongliang1873
dongliang1873
2019-09-04 20:23

同时下载同一文件多次

已采纳

I am concurrently downloading files (with a WaitGroup) from a slice of config objects (where each config object contains the URL that needs to be downloaded), but when I use concurrency, I get the same exact data written with every execution.

I believe I included everything below for a minimal reproducible example.

Here are my imports:

package main

import (
    "encoding/json"
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "os"
    "path"
    "path/filepath"
    "strconv"
    "strings"
    "sync"
)

The method that's looping through my objects and executing the go routine to download each file is here:

func downloadAllFiles(configs []Config) {
    var wg sync.WaitGroup
    for i, config := range configs {
        wg.Add(1)
        go config.downloadFile(&wg)
    }
    wg.Wait()
}

Basically, my function is downloading a file from a URL into a directory stored on NFS.

Here is the download function:

func (config *Config) downloadFile(wg *sync.WaitGroup) {
    resp, _ := http.Get(config.ArtifactPathOrUrl)
    fmt.Println("Downloading file: " + config.ArtifactPathOrUrl)
    fmt.Println(" to location: " + config.getNfsFullFileSystemPath())
    defer resp.Body.Close()

    nfsDirectoryPath := config.getBaseNFSFileSystemPath()
    os.MkdirAll(nfsDirectoryPath, os.ModePerm)
    fullFilePath := config.getNfsFullFileSystemPath()
    out, err := os.Create(fullFilePath)
    if err != nil {
        panic(err)
    }
    defer out.Close()

    io.Copy(out, resp.Body)
    wg.Done()
}

Here's a minimal part of the Config struct:

type Config struct {
    Namespace                 string                      `json:"namespace,omitempty"`
    Tenant                    string                      `json:"tenant,omitempty"`
    Name                      string                      `json:"name,omitempty"`
    ArtifactPathOrUrl         string                      `json:"artifactPathOrUrl,omitempty"`
}

Here are the instance/helper functions:

func (config *Config) getDefaultNfsURLBase() string {
    return "http://example.domain.nfs.location.com/"
}

func (config *Config) getDefaultNfsFilesystemBase() string {
    return "/data/nfs/location/"
}

func (config *Config) getBaseNFSFileSystemPath() string {
    basePath := filepath.Dir(config.getNfsFullFileSystemPath())
    return basePath
}

func (config *Config) getNfsFullFileSystemPath() string {
    // basePath is like: /data/nfs/location/
    trimmedBasePath := strings.TrimSuffix(config.getDefaultNfsFilesystemBase(), "/")
    fileName := config.getBaseFileName()
    return trimmedBasePath + "/" + config.Tenant + "/" + config.Namespace + "/" + config.Name + "/" + fileName
}

Here is how I'm getting the configs and unmarshalling them:

func getConfigs() string {
    b, err := ioutil.ReadFile("pulsarDeploy_example.json")
    if err != nil {
        fmt.Print(err)
    }
    str := string(b) // convert content to a 'string'
    return str
}

func deserializeJSON(configJson string) []Config {
    jsonAsBytes := []byte(configJson)
    configs := make([]Config, 0)
    err := json.Unmarshal(jsonAsBytes, &configs)
    if err != nil {
        panic(err)
    }
    return configs
}

For a minimal example, I think this data for the pulsarDeploy_example.json file should work:

[{   "artifactPathOrUrl": "http://www.java2s.com/Code/JarDownload/sample/sample.jar.zip",
        "namespace": "exampleNamespace1",
        "name": "exampleName1",
        "tenant": "exampleTenant1"
      },

      {   
        "artifactPathOrUrl": "http://www.java2s.com/Code/JarDownload/sample-calculator/sample-calculator-bundle-2.0.jar.zip",
        "namespace": "exampleNamespace1",
        "name": "exampleName2",
        "tenant": "exampleTenant1"
      },
      {   
        "artifactPathOrUrl": "http://www.java2s.com/Code/JarDownload/helloworld/helloworld.jar.zip",
        "namespace": "exampleNamespace1",
        "name": "exampleName3",
        "tenant": "exampleTenant1"
      },
      {   
        "artifactPathOrUrl": "http://www.java2s.com/Code/JarDownload/fabric-activemq/fabric-activemq-demo-7.0.2.fuse-097.jar.zip",
        "namespace": "exampleNamespace1",
        "name": "exampleName4",
        "tenant": "exampleTenant1"
      }
]

(Note that the example file URLs were just random Jars I grabbed online.)

When I run my code, instead of downloading each file, it repeatedly downloads the same file, and the information it prints to the console (from the Downloading file: and to location: lines) is exactly the same for each object (instead of printing the messages that are unique to each object), which definitely is a concurrency issue.

This issue reminds me of what happens if you try to run a for loop with a closure and end up locking a single object instance into your loop and executing repeatedly on the same object.

What is causing this behavior, and how do I resolve it?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

1条回答

  • doujuanxun7167 doujuanxun7167 2年前

    I'm pretty sure that your guess

    This issue reminds me of what happens if you try to run a for loop with a closure and end up locking a single object instance into your loop and executing repeatedly on the same object.

    is correct. The simple fix is to "assign to local var" like

    for _, config := range configs {
        wg.Add(1)
        cur := config
        go cur.downloadFile(&wg)
    }
    

    but I don't like APIs which take waitgroup as a parameter so I suggest

    for _, config := range configs {
        wg.Add(1)
        go func(cur Config) {
           defer wg.Done()
           cur.downloadFile()
        }(config)
    }
    

    and change downloadFile signature to func (config *Config) downloadFile() and drop the wg usage in it.

    点赞 评论 复制链接分享

为你推荐