I'm trying to parallelize my quadtree lookup by using go-routines to share the recursive lookup.
My Quadtree struct looks like this:
type Quadtree struct {
Rectangle //Boundary of the quadtree
Points []Point
Ne *Quadtree
Se *Quadtree
Sw *Quadtree
Nw *Quadtree
Divided bool
Capacity int
Parent *Quadtree
}
I have a method called QueryForNearestPoints
which accepts a Point
and a searchRadius
and returns all the points within the searcharea
in the Quadtree. If no points are found within the given searchRadius
, then the searchRadius
is increased and tried again.
points
is the channel to which points in the search-radius are send to. I'm also using a sync.WaitGroup
wg
to wait for all go-routines to finish.
func (q *Quadtree) QueryForNearestPoints(p Point, searchRadius float64) []QueryResult {
searcharea = Circle{p, searchRadius}
var testResults []QueryResult
for {
go func() {
loop:
for {
select {
case item, _ := <-points:
testResults = append(testResults, item)
case <-done:
break loop
}
}
}()
wg.Add(1)
go q.query(searcharea)
wg.Wait()
done <- true
if len(testResults) > 0 {
break
} else {
// Proportionally increase the search radius if no points are found.
searcharea = Circle{p, searcharea.Radius + searcharea.Radius*50/100}
}
}
return testResults
}
The parallelized quadtree's query method:
func (q *Quadtree) query(area Circle) {
defer wg.Done()
if !q.overlapsCircle(area) {
return
}
if len(q.Points) > 0 {
for i, point := range q.Points {
if area.contains(point) {
points <- QueryResult{point, i, q}
}
}
}
if q.Divided {
wg.Add(4)
go q.Ne.parallelquery(area)
go q.Se.parallelquery(area)
go q.Sw.parallelquery(area)
go q.Nw.parallelquery(area)
}
}
(q *Quadtree) parallelquery
method:
func (q *Quadtree) parallelquery(area Circle) {
semaphore <- struct{}{}
defer func() {
<-semaphore
}()
q.query(area)
}
If I'm not wrong this is a cpu-bound workload so having 1000's of go-routines isn't going to help, so I'm using a buffered channel semaphore
as a counting semaphore to make sure that at any point only n number of go-routines are active.(Here I made n as 4 because my computer has 4 cpus.)
The only problem is that this approach is significantly slower than the normal non-concurrent sequential approach. What am I missing here? How can i make this faster?