douzong3599 2015-10-15 11:04
浏览 50
已采纳

优化mgo以进行寻路

I have implemented an A* algorithm in Go to find the path between two coordinates on a map. The map data is fetched with mgo from a MongoDB collection.

It is however very slow. It sits around 4 seconds for a 1000 meter route. I have timed the different parts of the algorithm and concluded that the bottle neck is in the fetching from the database. Or to be precise: in the conversion from binary data to a data structure that Go understands.

I try to do as few requests as possible, and multi-thread the requests to make it faster, and it helps to a certain extent. But it's not nearly as fast as I wish it was.

It seems like I'm doing something fundamentally wrong. Any suggestions at all would be helpful.

Data structure in mongoDB: (Nodes fetched from OSM)

{ "_id" : NumberLong(194637483),
  "lat" : 55.7079899, 
  "lon" : 13.3756414,
  "neighbours" : [ NumberLong(1566264689), NumberLong(1566264806) ] 
}

Data structure in Go

type Node struct {
    ID         int64   `bson:"_id" json:"id"`
    Lat        float64 `bson:"lat" json:"lat"`
    Lon        float64 `bson:"lon" json:"lon"`
    Neighbours []int64 `bson:"neighbours" json:"neighbours"`
}

Code for getting a piece of data:

func addToBufferLong(buffer *WriteLockMap, session *mgo.Session, at, goal geo.Node, lowerLat, higherLat, lowerLon, higherLon float64) {
    c := session.DB("collectionName").C("nodes")
    query := c.Find(bson.M{"lat": bson.M{"$gt": lowerLat, "$lt": higherLat}, "lon": bson.M{"$gt": lowerLon, "$lt": higherLon}})

    var nodes []geo.Node
    query.All(&nodes)
    for _, node := range nodes {
        tmp := PathNode{0, node.DistanceTo(goal), 0, node}
        buffer.Lock()
        buffer.m[node.ID] = tmp
        buffer.Unlock()
    }
}

Edit:

The multi-threading strategy is based on splitting the area I wish to query into 4 different squares, quadrants if you will, and doing them separately with addToBufferLong(...)

Most recent print-outs:

> time GOMAXPROCS=8 ./toystar
Starting A star
Thread A, count: 19657, db-fetch: 0.122792104s, parsing: 0.934650055s
Thread B, count: 19574, db-fetch: 0.274384302s, parsing: 1.196350664s
Thread C, count: 4197, db-fetch: 0.388197823s, parsing: 0.930109241s
Thread D, count: 9900, db-fetch: 0.540008325s, parsing: 0.93963728s

Total database fetch:  1.534268099 s
Total A star (with fetches):  1.854748244

real    0m1.907s

where db-fetch measures the time it takes to do the row that starts with query := c.Find(...) and parsing measures the time it takes to do the query.All(&nodes)

Edit 2:

I've managed to drop the execution time significantly, with the help of fellow stack overflow users. Current print-outs:

> time GOMAXPROCS=8 ./toystar

Starting A star
Thread A: 0.210783141s
Thread B: 0.380938949s
Thread C: 0.441447793s
Thread D: 0.507361847s

Total database fetch:  0.507543875 s
number of fetches:  1
Total A star:  0.826343287s

real    0m0.860s

The main difference being the multi-threading strategy and using *mgo.Iter instead of query.All(&nodes)

  • 写回答

1条回答 默认 最新

  • dongzong3053 2015-10-15 13:25
    关注

    From the information available, this is what I can deduce:

    1. Your machine seems to be slow, maybe an embedded device?

    The stage you called db-fetch doesn't actually access the database. The only thing c.Find(...) does is building a *mgo.Query value. The method is 6 lines long. This shouldn't take more than a millisecond. Unless there's contention on the database object's internal session state, which doesn't seem to be the case because you're using only 4 goroutines.

    1. Bottleneck seems to be network and/or reflection

    query.All(&nodes) is where the query is actually executed on your database. In addition, this phase allocates the slice of nodes it needs and then iteratively decodes bson into your struct definition via reflection.

    1. One thing you could try: *mgo.iter

    Instead of query.All(...) you could use query.Iter() to obtain a *mgo.Iter and iterate batch-wise over your data sets. This might improve performance by better distributing network load over time.

    1. Use geospacial indexing, Mongo supports it

    See the documentation. Maybe you're already doing this. If not, it might improve lookup time.

    1. Split up your quadrants into subquadrants, recursively.

    I think this one is obvious. Divide and conquer, right? :)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 lammps拉伸应力应变曲线分析
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥15 请问Lammps做复合材料拉伸模拟,应力应变曲线问题
  • ¥30 python代码,帮调试,帮帮忙吧
  • ¥15 #MATLAB仿真#车辆换道路径规划