goroutine多次消耗同一行

currently i have a scenario where i have huge file (for example im going to say 500k lines of text) and the idea is to use worker (threads) to process them by 100 each thread. after running my code, i still wonder why the goroutines consume the same line more than once? im guessing it's racing to get the job done.

here's my code

package main

import (
     "log"
     "bufio"
     "fmt"
     "encoding/csv"
     "encoding/json"
     "io"
     "os"
     "sync"
)

type IMDBDataModel struct {
     Color                  string `json:"color"`
     DirectorName           string `json:"director_name"`
     NumCriticForReviews    string `json:"num_critic_for_reviews"`
     Duration               string `json:"duration"`
     DirectorFacebookLikes  string `json:"director_facebook_likes"`
     Actor3FacebookLikes    string `json:"actor_3_facebook_likes"`
     Actor2Name             string `json:"actor_2_name"`
     Actor1FacebookLikes    string `json:"actor_1_facebook_likes"`
     Gross                  string `json:"gross"`
     Genre                  string `json:"genres"`
     Actor1Name             string `json:"actor_1_name"`
     MovieTitle             string `json:"movie_title"`
     NumVotedUser           string `json:"num_voted_users"`
     CastTotalFacebookLikes string `json:"cast_total_facebook_likes"`
     Actor3Name             string `json:"actor_3_name"`
     FaceNumberInPoster     string `json:"facenumber_in_poster"`
     PlotKeywords           string `json:"plot_keywords"`
     MovieIMDBLink          string `json:"movie_imdb_link"`
     NumUserForReviews      string `json:"num_user_for_reviews"`
     Language               string `json:"language"`
     Country                string `json:"country"`
     ContentRating          string `json:"content_rating"`
     Budget                 string `json:"budget"`
     TitleYear              string `json:"title_year"`
     Actor2FacebookLikes    string `json:"actor_2_facebook_likes"`
     IMDBScore              string `json:"imdb_score"`
     AspectRatio            string `json:"aspect_ratio"`
     MovieFacebookLikes     string `json:"movie_facebook_likes"`
}

var iterated int64
var out []*IMDBDataModel

func populateString(input []IMDBDataModel, out []*IMDBDataModel, wg *sync.WaitGroup) {
     for _ , data := range input {          
          out = append(out, &data)
     }     
     wg.Done()
}

func consumeData(input <-chan *IMDBDataModel, wg *sync.WaitGroup){
     defer wg.Done()
     for data := range input {          
          iterated++          
          fmt.Printf("%d : %s
", iterated, data.MovieTitle)
          out = append(out, data)
     }
     fmt.Println("output size : ", len(out))

}

func processCSV(path string) (imdbList []IMDBDataModel){
     csvFile, _ := os.Open(path)
     reader := csv.NewReader(bufio.NewReader(csvFile))

     for {          
          line, error := reader.Read()
          if error == io.EOF {
               break
          } else if error != nil {
               log.Fatal(error)
          }
          imdbList = append(imdbList, 
               IMDBDataModel{
                    Color: line[0],
                    DirectorName: line[1],
                    NumCriticForReviews : line[2],
                    Duration: line[3],
                    DirectorFacebookLikes: line[4],
                    Actor3FacebookLikes: line[5],
                    Actor2Name: line[6],
                    Actor1FacebookLikes: line[7],
                    Gross: line[8],
                    Genre: line[9],
                    Actor1Name: line[10],
                    MovieTitle: line[11],
                    NumVotedUser: line[12],
                    CastTotalFacebookLikes: line[13],
                    Actor3Name: line[14],
                    FaceNumberInPoster: line[15],
                    PlotKeywords: line[16],
                    MovieIMDBLink: line[17],
                    NumUserForReviews: line[18],
                    Language: line[19],
                    Country: line[20],
                    ContentRating: line[21],
                    Budget: line[22],
                    TitleYear: line[23],
                    Actor2FacebookLikes: line[24],
                    IMDBScore: line[25],
                    AspectRatio: line[26],
                    MovieFacebookLikes: line[27],
               },
          )          
     }
     imdbJson, err := json.Marshal(imdbList)
     if err != nil {
          log.Println(imdbJson)
     }

     return 
}

func main() {     
     imdbList := processCSV("movie_metadata.csv")     
     imdbChannel  := make(chan *IMDBDataModel, 100) // buffer

     var wg sync.WaitGroup
     for i := 0; i < 5;i++ {
          wg.Add(1)
          go consumeData(imdbChannel,&wg)     
     }

     for _ ,task := range imdbList {          
          imdbChannel <- &task               
     }

     close(imdbChannel)     
     wg.Wait()

     // for _, item := range out {
     //      fmt.Println(item.MovieTitle)
     // }

     fmt.Println("Total Channel :", len(imdbChannel)) 
     fmt.Println("Total IMDB :", len(imdbList))
     fmt.Println("Total Data: ", len(out))
     fmt.Println("Iterated : ", iterated)
     fmt.Println("Goroutines finished..")


}

EDITED: after few suggestions on adding mutex and another channel, this is the modified consume function

func consumeData(input <-chan *IMDBDataModel, output chan *IMDBDataModel, wg *sync.WaitGroup) {
    defer wg.Done()
    for data := range input {
        iterated++
        // outLock.Lock()
        // out = append(out, data)
        // outLock.Unlock()
        output <- data
    }
}

however still consuming the same line (race occured) more than once.

....
My Date with Drew 
My Date with Drew 
My Date with Drew 
My Date with Drew 
My Date with Drew 
Total Channel : 0
Total IMDB : 5044
Total Data:  4944
Iterated :  5000
Goroutines finished..

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

dongliang2058 2017-08-18 12:52

关注

You issues is with:

var out []*IMDBDataModel

func consumeData(input <-chan *IMDBDataModel, wg *sync.WaitGroup){
     defer wg.Done()
     for data := range input {          
          iterated++          
          fmt.Printf("%d : %s
", iterated, data.MovieTitle)
          out = append(out, data)
     }
     fmt.Println("output size : ", len(out))

}

You are appending to "out" from multiple threads:

try adding a lock around the places you write to "out" like this:

var out []*IMDBDataModel
var outLock sync.Mutex

func consumeData(input <-chan *IMDBDataModel, wg *sync.WaitGroup){
     defer wg.Done()
     for data := range input {          
          iterated++          
          fmt.Printf("%d : %s
", iterated, data.MovieTitle)
          outLock.Lock()
          out = append(out, &data)
          outLock.Unlock()
     }
     outLock.Lock()
     fmt.Println("output size : ", len(out))
     outLock.Unlock()

}

报告相同问题？

关注问题

如何停止同一goroutine的多个
2019-03-19 17:22

回答 1 已采纳 You cannot stop a goroutine "from the outside". You have to pass some kind of cancellation signal
goroutine为什么不两次处理同一文件？
2018-11-06 19:10

回答 1 已采纳 walkFiles, which was not reproduced in your question but which is key to understanding it, has the
使用通道同步多个goroutine
2018-05-13 06:29

回答 2 已采纳 Use sync.WaitGroup to wait for goroutines to complete. Close channels to cause loops reading on c
go进阶(1) -深入浅出goroutine并发运行调度机制
2023-02-19 05:44

hguisu的博客并发指的是同时进行多个任务的程序，Web处理请求，读写处理操作，I/O操作都可以充分利用并发增长处理速度，随着网络的普及，并发操作逐渐不可或缺。
如何等待其他多个Goroutine的单个Goroutine响应？
2019-02-24 16:50

回答 1 已采纳 What I would to to solve your task is I would use a goroutine pool for this. There would be a prod
如果一次执行中发生错误，则关闭多个goroutine
2017-08-04 07:37

回答 1 已采纳 You may use the context package which was created for things like this ("carries deadlines, cancel
同一频道中的两个goroutine-如何执行？
2017-10-09 06:54

回答 2 已采纳 the thing is u've missed the concept of concurrency there is no guarantee in executing functions i
Go goroutine
2021-03-28 11:01

JunChow520的博客在Java或C++中实现并发编程时，通常需要自己维护一个线程池，并需要包装一个又一个的任务去...Go语言的goroutine就是这种机制，goroutine的概念类似于线程，不同之处在于goroutine由Go程序运行时调度和管理。Go程序...
具有多个通道的多个goroutine的死锁
2018-11-05 05:11

回答 1 已采纳 We can iterate through values sent over a channel. To break such iteration channel needs to be clo
使用许多goroutine消耗内存
2014-03-11 13:12

回答 2 已采纳 The runtime/debug.SetMaxStack function only determines a what point does go consider a program inf
Goroutine超时
2018-07-07 12:07

回答 2 已采纳 You control cancelation of http requests with a context.Context. // create a timeout or cancelati
Go语言学习-goroutine
2019-07-13 17:36

哈哈，柳暗花明的博客 goroutine是go语言中最为NB的设计，也是其魅力所在，goroutine的本质是协程，是实现并行计算的核心。goroutine使用方式非常的简单，只需使用go关键字即可启动一个协程，并且它是处于异步方式运行，你不需要等它运行...
多次执行例程的一个障碍
2018-04-26 21:20

回答 1 已采纳 If you just want to wait for the goroutines to complete and don't need to get a result back over t
php协程和goroutine,Goroutine(协程)的理解
2021-04-15 20:36

白如冰的博客 title: Goroutine(协程)的理解tags: Go,GoroutineAuthor: Clown95并发概念Go语言相对于其他语言的最大一个特色就是支持高并发编程模式。Goroutine(协程)是Go中最基本的执行单元。事实上每一个Go程序至少有一个...
GO学习之协程(goroutine)
2023-07-31 15:33

YPhen的博客在 Go 语言中，goroutine 是一种非常轻量的执行单元，有 Go 运行是（runtime）进行调度，不是有固定大小的线程来处理的。与传统线程相比，goroutine的创建和切换开销很小，因此可以创建大量的 goroutine 来并行执行...
没有解决我的问题, 去提问

悬赏问题

¥15 c语言怎么用printf（“\b \b”）与getch（）实现黑框里写入与删除？
¥20 怎么用dlib库的算法识别小麦病虫害
¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
¥15 java写代码遇到问题，求帮助
¥15 uniapp uview http 如何实现统一的请求异常信息提示？
¥15 有了解d3和topogram.js库的吗？有偿请教
¥100 任意维数的K均值聚类
¥15 stamps做sbas-insar，时序沉降图怎么画
¥15 买了个传感器，根据商家发的代码和步骤使用但是代码报错了不会改，有没有人可以看看
¥15 关于#Java#的问题，如何解决？

码龄粉丝数原力等级 --

goroutine多次消耗同一行

1条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

goroutine多次消耗同一行

1条回答 默认 最新

悬赏问题

1条回答默认最新