doubipeng1336 2014-11-30 19:46
浏览 199
已采纳

在Golang中同时读取文件

The reading part isn't concurrent but the processing is. I phrased the title this way because I'm most likely to search for this problem again using that phrase. :)

I'm getting a deadlock after trying to go beyond the examples so this is a learning experience for me. My goals are these:

  1. Read a file line by line (eventually use a buffer to do groups of lines).
  2. Pass off the text to a func() that does some regex work.
  3. Send the results somewhere but avoid mutexes or shared variables. I'm sending ints (always the number 1) to a channel. It's sort of silly but if it's not causing problems I'd like to leave it like this unless you folks have a neater option.
  4. Use a worker pool to do this. I'm not sure how I tell the workers to requeue themselves?

Here is the playground link. I tried to write helpful comments, hopefully this makes sense. My design could be completely wrong so don't hesitate to refactor.

package main

import (
  "bufio"
  "fmt"
  "regexp"
  "strings"
  "sync"
)

func telephoneNumbersInFile(path string) int {
  file := strings.NewReader(path)

  var telephone = regexp.MustCompile(`\(\d+\)\s\d+-\d+`)

  // do I need buffered channels here?
  jobs := make(chan string)
  results := make(chan int)

  // I think we need a wait group, not sure.
  wg := new(sync.WaitGroup)

  // start up some workers that will block and wait?
  for w := 1; w <= 3; w++ {
    wg.Add(1)
    go matchTelephoneNumbers(jobs, results, wg, telephone)
  }

  // go over a file line by line and queue up a ton of work
  scanner := bufio.NewScanner(file)
  for scanner.Scan() {
    // Later I want to create a buffer of lines, not just line-by-line here ...
    jobs <- scanner.Text()
  }

  close(jobs)
  wg.Wait()

  // Add up the results from the results channel.
  // The rest of this isn't even working so ignore for now.
  counts := 0
  // for v := range results {
  //   counts += v
  // }

  return counts
}

func matchTelephoneNumbers(jobs <-chan string, results chan<- int, wg *sync.WaitGroup, telephone *regexp.Regexp) {
  // Decreasing internal counter for wait-group as soon as goroutine finishes
  defer wg.Done()

  // eventually I want to have a []string channel to work on a chunk of lines not just one line of text
  for j := range jobs {
    if telephone.MatchString(j) {
      results <- 1
    }
  }
}

func main() {
  // An artificial input source.  Normally this is a file passed on the command line.
  const input = "Foo
(555) 123-3456
Bar
Baz"
  numberOfTelephoneNumbers := telephoneNumbersInFile(input)
  fmt.Println(numberOfTelephoneNumbers)
}
  • 写回答

2条回答 默认 最新

  • duanniling0018 2014-11-30 20:37
    关注

    You're almost there, just need a little bit of work on goroutines' synchronisation. Your problem is that you're trying to feed the parser and collect the results in the same routine, but that can't be done.

    I propose the following:

    1. Run scanner in a separate routine, close input channel once everything is read.
    2. Run separate routine waiting for the parsers to finish their job, than close the output channel.
    3. Collect all the results in you main routine.

    The relevant changes could look like this:

    // Go over a file line by line and queue up a ton of work
    go func() {
        scanner := bufio.NewScanner(file)
        for scanner.Scan() {
            jobs <- scanner.Text()
        }
        close(jobs)
    }()
    
    // Collect all the results...
    // First, make sure we close the result channel when everything was processed
    go func() {
        wg.Wait()
        close(results)
    }()
    
    // Now, add up the results from the results channel until closed
    counts := 0
    for v := range results {
        counts += v
    }
    

    Fully working example on the playground: http://play.golang.org/p/coja1_w-fY

    Worth adding you don't necessarily need the WaitGroup to achieve the same, all you need to know is when to stop receiving results. This could be achieved for example by scanner advertising (on a channel) how many lines were read and then the collector reading only specified number of results (you would need to send zeros as well though).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)