dqzve68846 2018-05-21 22:57
浏览 102

快速方式按距离搜索数百万个坐标

I have a data set of about 20 million coordinates. I want to be able to pass in a latitude, longitude, and distance in miles and return all coordinates that are within the mile range of my given coordinates. I need the response time to ideally be sub 50ms.

I have tried loading all coordinates in memory in a golang service which, on every request, will loop through the data and using haversine filter all coordinates which are within the given miles distance of my given coordinate.

This method sees the results return in around 2 seconds. What approach would be good to increase the speed of the results? I am open to any suggestions.

I am toying around with the idea of grouping all coordinates by degree and only filtering by the nearest to the given coordinates. Haven't had any luck improving the response times yet though. My data set is only a test one too as the real data could potentially be in the hundreds of millions.

  • 写回答

3条回答 默认 最新

  • douran7929 2018-05-21 23:07
    关注

    Idea would be to have a "grid" that partitions coordinates, so that when you do need to do a lookup you can safely return all coordinates in particular cell, do not return any from the cells too far away from target, and only do per coordinate comparison for coordinates that are in the cells that contains some coordinates within distance and some outside the distance.

    Simplified to 1D:

    Coordinates are from 1 to 100

    you partition into 5 blocks of 20

    When somebody looks for all coordinates within distance 25 from 47 you return all coordinates in blocks [30,39], [40,49],[50,59],[60,69] and then after doing per coordinate analysis for blocks [20,29] and [70,79] you additionally return 22,23,24,25,26,27,28,29, 70,71,72.

    Unfortunately I have no realistic way to estimate speedup of this approach so you would need to implement it and benchmark it by yourself.

    评论

报告相同问题?

悬赏问题

  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?
  • ¥15 gdf格式的脑电数据如何处理matlab
  • ¥20 重新写的代码替换了之后运行hbuliderx就这样了
  • ¥100 监控抖音用户作品更新可以微信公众号提醒
  • ¥15 UE5 如何可以不渲染HDRIBackdrop背景
  • ¥70 2048小游戏毕设项目
  • ¥20 mysql架构,按照姓名分表
  • ¥15 MATLAB实现区间[a,b]上的Gauss-Legendre积分