在Go中创建并行字计数器

I am trying to create a word counter that returns an array of the number of times each word in a text file appears. Moreover, I have been assigned to parallelize this program.

My initial attempt at this task was as follows

Implementation 1

func WordCount(words []string, startWord int, endWord int, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
    freqs := make(map[string]int)
    for i := startWord; i < endWord; i++ {
        word := words[i]
        freqs[word]++
    }
    freqsChannel <- freqs
    waitGroup.Done()
}

func ParallelWordCount(text string) map[string]int {
    // Split text into string array of the words in text.
    text = strings.ToLower(text)
    text = strings.ReplaceAll(text, ",", "")
    text = strings.ReplaceAll(text, ".", "")
    words := strings.Fields(text)
    length := len(words)
    threads := 28
    freqsChannel := make(chan map[string]int, threads)

    var waitGroup sync.WaitGroup
    waitGroup.Add(threads)
    defer waitGroup.Wait()

    wordsPerThread := length / threads // always rounds down
    wordsInLastThread := length - (threads-1)*wordsPerThread
    startWord := -wordsPerThread
    endWord := 0
    for i := 1; i <= threads; i++ {
        if i < threads {
            startWord += wordsPerThread
            endWord += wordsPerThread
        } else {
            startWord += wordsInLastThread
            endWord += wordsInLastThread
        }
        go WordCount(words, startWord, endWord, &waitGroup, freqsChannel)
    }
    freqs := <-freqsChannel
    for i := 1; i < threads; i++ {
        subFreqs := <-freqsChannel
        for word, count := range subFreqs {
            freqs[word] += count
        }
    }
    return freqs
}

According to my teaching assistant, this was not a good solution as the pre-processing of the text file carried out by

text = strings.ToLower(text)
text = strings.ReplaceAll(text, ",", "")
text = strings.ReplaceAll(text, ".", "")
words := strings.Fields(text)

in ParallelWordCount goes against the idea of parallel processing.

Now, to fix this, I have moved the responsibility of processing the text file into an array of words into the the WordCount function that is called on separate goroutines for different parts of the text file. Below is the code for my second implementation.

Implementation 2

func WordCount(text string, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
    freqs := make(map[string]int)
    text = strings.ToLower(text)
    text = strings.ReplaceAll(text, ",", "")
    text = strings.ReplaceAll(text, ".", "")
    words := strings.Fields(text)
    for _, value := range words {
        freqs[value]++
    }
    freqsChannel <- freqs
    waitGroup.Done()
}

func splitCount(str string, subStrings int, waitGroup *sync.WaitGroup, freqsChannel chan<- map[string]int) {
    if subStrings != 1 {
        length := len(str)
        charsPerSubstring := length / subStrings
        i := 0
        for str[charsPerSubstring+i] != ' ' {
            i++
        }
        subString := str[0 : charsPerSubstring+i+1]
        go WordCount(subString, waitGroup, freqsChannel)
        splitCount(str[charsPerSubstring+i+1:length], subStrings-1, waitGroup, freqsChannel)
    } else {
        go WordCount(str, waitGroup, freqsChannel)
    }
}

func ParallelWordCount(text string) map[string]int {
    threads := 28
    freqsChannel := make(chan map[string]int, threads)

    var waitGroup sync.WaitGroup
    waitGroup.Add(threads)
    defer waitGroup.Wait()

    splitCount(text, threads, &waitGroup, freqsChannel)

    // Collect and return frequences
    freqs := <-freqsChannel
    for i := 1; i < threads; i++ {
        subFreqs := <-freqsChannel
        for word, count := range subFreqs {
            freqs[word] += count
        }
    }
    return freqs
}

The average runtime of this implementation is 3 ms compared to the old average of 5 ms, but have I thoroughly addressed the issue raised by my teaching assistant or does the second implementation also not take full advantage of parallel processing to efficiently count the words of text file?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongye1143 2019-04-19 12:38
关注
Two things that I see:

Second example is better as you have split the text parsing and word counting into several goroutines. One thing you can try is to not count words in WordCount method, but just push them to the channel and increment them in the main counter. You can check if that is any faster, I'm not sure. Also, check the fan-in pattern for more details.

Parallel processing might still not be fully utilized, because I don't believe you have 28 CPU cores available :). Number of cores is determining how many WordCount goroutines are working in parrallel, the rest of them will be distributed concurrently base on available resources (available CPU cores). Here is a great article explaining this.
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

如何在Golang中使用并行子测试处理父级测试拆卸
2018-12-27 19:54

回答 1 已采纳 In the Go Blog on subtests it's mentioned how to do this: func TestParallelSubtest(t *testing.T)
如何在Golang中实现适当的并行性？ goroutine是否与Go1.5 +并行？
2018-03-04 21:09

回答 1 已采纳 The Go Playground is a single-processor virtual machine. You are running a trivial goroutine. Toy
是在Go中并行执行测试还是一一执行？
2017-06-02 09:19

回答 2 已采纳 It's really easy to test it: func Test1(t *testing.T) { fmt.Println("Test1 start") time.S
Go 中channel/goroutine实现并发和并行
2022-11-17 20:29

悟道xn的博客 go 如何使用协程
如何在Go中并行调用函数
2017-04-18 16:38

回答 1 已采纳 Keep in mind that goroutines provide concurrency, and concurrency is not parallelism. The problem
stream并行流在工作中常用么 java
2022-02-17 17:37

回答 3 已采纳 1.一般用不到2.流的作用更多可以用在遍历list
Golang中的并行处理
2014-08-03 16:06

回答 4 已采纳 Your code will run concurrently, but not in parallel. You can make it run in parallel by setting G
go 并发/并行/协程/sync锁读写锁
2023-06-05 16:27

Michaelwubo的博客 goroutine(协程也就是任务) 是一种非常轻量级的实现，可在单个进程里执行成千上万的并发任务，它是Go语言并发设计的核心。说到底 goroutine 其实就是线程，但是它比线程更小，十几个 goroutine 可能体现在底层就是五...
如何自动使测试在程序包中并行运行？
2019-07-08 01:33

回答 1 已采纳 This comes from commit f80d8fb and Go1 (Oct. 2011) An option (--parallel) was debated at the time
在并行quicksort实现中使用go例程时，性能较差
2015-03-20 16:29

回答 2 已采纳 Turns out it was very simple. As I'm on a new machine, the GOMAXPROCS variable wasn't set. The ne
并行计算Golang中的工作率计算
2017-03-06 22:31

回答 1 已采纳 First, I would try to explain what was going on here (NumCPU=4): package main import ( "imag
go python并发_Golang并发和并行（子gorutine）
2020-12-09 20:43

weixin_39619481的博客前言CPU最小执行单位是线程，后台开发人员一直在费尽心思得解决大并发问题从单线程----->多线程(切换)-------->协程(上下文开销小)，无非是在寻找1种相对完美的方案当1个线程遇到IO阻塞时可以让OS以最小的...
RGBA到并行Golang中的灰度
2017-03-02 18:31

回答 2 已采纳 Here's an implementation of @JimB's suggestion in comments. It exploits the fact of JPEG images be
Go语言并发并行与依赖管理
2023-01-18 23:25

capucino_bubble的博客并行可以理解为是实现并发的一个手段，Go 语言在GOMAXPROCS数量与任务数量相等时，可以做到并行执行，但一般情况下都是并发执行。Go 可以充分发挥**多核**优势，高效运行。
golang 并行执行多任务
2023-02-18 22:55

罗三胖的博客 golang 并行执行多任务 golang 有没有类似java Callable>FutureTask并行执行组件呢？当然有，话不多少上代码；代码结束，欢迎来喷。
没有解决我的问题, 去提问

悬赏问题

¥15 基于卷积神经网络的声纹识别
¥15 Python中的request，如何使用ssr节点，通过代理requests网页。本人在泰国，需要用大陆ip才能玩网页游戏，合法合规。
¥100 为什么这个恒流源电路不能恒流？
¥15 有偿求跨组件数据流路径图
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 stm32开发clion时遇到的编译问题

在Go中创建并行字计数器

2条回答 默认 最新

悬赏问题

2条回答默认最新