doubi9999 2016-08-12 17:11
浏览 45
已采纳

Goroutines分享片::试图了解数据竞赛

I try to make a program in Go to find some genes in very large files of DNA sequences. I already made a Perl program to do that but I would like to take advantage of the goroutines to perform this search in parallel ;)

Because the files are huge, my idea was to read 100 sequences at a time, then send the analysis to a goroutine, and read again 100 sequences etc.

I would like to thank the member of this site for their really helpful explanations concerning slices and goroutines.

I have made the suggested change, to use a copy of the slice processed by the goroutines. But the -race execution still detect one data race at the level of the copy() function :

Thank you very much for your comments !

    ==================
WARNING: DATA RACE
Read by goroutine 6:
  runtime.slicecopy()
      /usr/lib/go-1.6/src/runtime/slice.go:113 +0x0
  main.main.func1()
      test_chan006.go:71 +0xd8

Previous write by main goroutine:
  main.main()
      test_chan006.go:63 +0x3b7

Goroutine 6 (running) created at:
  main.main()
      test_chan006.go:73 +0x4c9
==================
[>5HSAA098909 BA098909 ...]
Found 1 data race(s)
exit status 66

    line 71 is : copy(bufCopy, buf_Seq)
    line 63 is : buf_Seq = append(buf_Seq, line)
    line 73 is :}(genes, buf_Seq)




    package main

import (
    "bufio"
    "fmt"
    "os"
    "github.com/mathpl/golang-pkg-pcre/src/pkg/pcre"
    "sync"
)

// function read a list of genes and return a slice of gene names
func read_genes(filename string) []string {
    var genes []string // slice of genes names
    // Open the file.
    f, _ := os.Open(filename)
    // Create a new Scanner for the file.
    scanner := bufio.NewScanner(f)
    // Loop over all lines in the file and print them.
    for scanner.Scan() {
          line := scanner.Text()
        genes = append(genes, line)
    }
    return genes
}

// function find the sequences with a gene matching gene[] slice
func search_gene2( genes []string, seqs []string) ([]string) {
  var res []string

  for r := 0 ; r <= len(seqs) - 1; r++ {
    for i := 0 ; i <= len(genes) - 1; i++ {

      match := pcre.MustCompile(genes[i], 0).MatcherString(seqs[r], 0)

      if (match.Matches() == true) {
          res = append( res, seqs[r])           // is the gene matches the gene name is append to res
          break
      }
    }
  }

  return res
}
//###########################################

func main() {
    var slice []string
    var buf_Seq []string
    read_buff := 100    // the number of sequences analysed by one goroutine

    var wg sync.WaitGroup
    queue := make(chan []string, 100)

    filename := "fasta/sequences.tsv"
    f, _ := os.Open(filename)
    scanner := bufio.NewScanner(f)
    n := 0
    genes := read_genes("lists/genes.csv")

    for scanner.Scan() {
            line := scanner.Text()
            n += 1
            buf_Seq = append(buf_Seq, line) // store the sequences into buf_Seq
            if n == read_buff {   // when the read buffer contains 100 sequences one goroutine analyses them

          wg.Add(1)

          go func(genes, buf_Seq []string) {
            defer wg.Done()
                        bufCopy := make([]string, len(buf_Seq))
                        copy(bufCopy, buf_Seq)
            queue <- search_gene2( genes, bufCopy)
            }(genes, buf_Seq)
                        buf_Seq = buf_Seq[:0]   // reset buf_Seq
              n = 0 // reset the sequences counter

        }
    }
    go func() {
            wg.Wait()
            close(queue)
        }()

        for t := range queue {
            slice = append(slice, t...)
        }

        fmt.Println(slice)
}
  • 写回答

4条回答 默认 最新

  • dongtan8122 2016-08-12 17:37
    关注

    The data race exists because slices are reference types in Go. They are generally passed by value, but being reference types, any changes made to the one value is reflected in another. Consider:

    func f(xs []string) {
        xs[0] = "changed_in_f"
    }
    
    func main() {
        xs := []string{"set_in_ main", "asd"}
        fmt.Println("Before call:", xs)
        f(xs)
        fmt.Println("After call:", xs)
    
        var ys []string
        ys = xs
        ys[0] = "changed_through_ys"
        fmt.Println("After ys:", xs)
    
    }
    

    This prints:

    Before call: [set_in_main asd]
    After call: [changed_in_f asd]
    After ys: [changed_through_ys asd]
    

    This happens because all three slices share the same underlying array in memory. More details here.

    This is what might be happening when you pass buf_Seq to search_gene2. A new slice value is passed to each call, however, each slice value may be referring to the same underlying array, causing potential race condition (call to append may change the underlying array of a slice).

    To solve the problem, try this in your main:

    bufCopy := make([]string, len(buf_Seq))
    // make a copy of buf_Seq in an entirely separate slice
    copy(buffCopy, buf_Seq)
    go func(genes, buf_Seq []string) {
            defer wg.Done()
            queue <- search_gene2( genes, bufCopy)
        }(genes, buf_Seq)
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 一道python难题
  • ¥15 用matlab 设计一个不动点迭代法求解非线性方程组的代码
  • ¥15 牛顿斯科特系数表表示
  • ¥15 arduino 步进电机
  • ¥20 程序进入HardFault_Handler
  • ¥15 oracle集群安装出bug
  • ¥15 关于#python#的问题:自动化测试
  • ¥20 问题请教!vue项目关于Nginx配置nonce安全策略的问题
  • ¥15 教务系统账号被盗号如何追溯设备
  • ¥20 delta降尺度方法,未来数据怎么降尺度