I'm trying out Go for doing some filesystem use analysis and I went for making the code as fast as possible by spawning almost everything off as a goroutine and relying on the Go VM (and GOMAXPROCS) to manage it. I was watching this code run (pretty quickly) until it just stopped dead. I checked top and it listed my process as having 1500 threads.
I thought maybe I had hit some limit and the process was therefore deadlocked waiting on the OS. I checked my OS (FreeBSD) limits, and sure enough it was listed as 1500 threads max per process.
Surprised, I checked the Go docs and it says GOMAXPROCS is only a limit on running threads, but blocked threads don't count.
So my questions:
Is it fair to say I can't rely on the Go VM as a global pool to prevent hitting OS limits of these kinds?
-
Is there an idiomatic way to handle this (be nice, it's only my second day using Go)?
In particular, I haven't found a great way other than sync to close a channel when I'm done using it. Is there a better way?
I'd like to abstract away the boilerplate (parallel mapping with go routines and closing channel when done), is there a type-safe way to do this without generics?
Here's my current code:
func AnalyzePaths(paths chan string) chan AnalyzedPath {
analyzed := make(chan AnalyzedPath)
go func() {
group := sync.WaitGroup{}
for path := range paths {
group.Add(1)
go func(path string) {
defer group.Done()
analyzed <- Analyze(path)
}(path)
}
group.Wait()
close(analyzed)
}()
return analyzed
}
func GetPaths(roots []string) chan string {
globbed := make(chan string)
go func() {
group := sync.WaitGroup{}
for _, root := range roots {
group.Add(1)
go func(root string) {
defer group.Done()
for _, path := range glob(root) {
globbed <- path
}
}(root)
}
group.Wait()
close(globbed)
}()
return globbed
}
func main() {
paths := GetPaths(patterns)
for analyzed := range AnalyzePaths(paths) {
fmt.Println(analyzed)
}
}