dongmacuo1193 2011-03-08 18:53 采纳率: 0%
浏览 60
已采纳

PHP / MySQL中的地理搜索(距离)(性能)

I have a MySQL-table (MyISAM) containing about 200k entries of lat/long pairs that I select from, based on the pairs distance (great circle formula) from another lat/long pair. (e.g. all entries that are within a 10km radius around 50.281852, 2.504883)

My problem is that this query takes about 0,28 sec. to run just for those 200k entries (which continue to get more every day). While 0,28 sec. would be fine normally, this query runs very often as it powers the main feature of my web-app, and often times it's part of a larger query.

Is there any way to speed this up? Obviosly MySQL has to run through all 200k entries every time and perform the great circle formula for every entry. I read something about geohashing, R-Trees and the like here on stackoverflow but I don't think that's the way I want to go. Partly because I've never been a big fan of maths, but mostly because I think that this problem has already been solved by someone smarter than me in a library/extension/etc. that has been tested extensively and is being updated regularly.

MySQL seems to have a spatial extension but that one doesn't provide a distance function. Should I be looking at another database to put this coordinate-pairs in? PostgreSQL seems to have a fairly mature Spatial extension. Do you know anything about it? Or would PostgreSQL too simply just use the great circle formula to get all entries within a certain region?

Is there maybe a specialized stand-alone product or mysql-extension that already does what I'm looking for?

Or is there maybe A PHP library I could use to do the calculations? Using APC I could easily fit the lat-long pairs into memory (those 200k entries take about 5MB) and then run the query inside of PHP. The problem with this approach however is that then I'd have a MySQL query like SELECT .. FROM .. WHERE id in (id1, id2, ..) for all the results wich can be up to a few thousand. How well does MySQL handle Queries like these? And then (since this is a number-crunching task) would doing this in PHP be fast enough?

Any other Ideas what I should/shouldn't do?

For completenes, here is the sample query, stripped of any irrelevant parts (as I said, usually this is part of a bigger query where I join multiple tables):

SELECT id, 6371 * acos( sin( radians( 52.4042924 ) ) * sin( radians( lat ) ) + cos( radians( 50.281852 ) ) * cos( radians( lat ) ) * cos( radians( 2.504883 ) - radians( lon ) ) ) AS dst
FROM geoloc
HAVING dst <10
ORDER BY dst ASC

Thank You!

  • 写回答

4条回答 默认 最新

  • dso15221 2011-03-08 21:40
    关注

    Calculate a bounding box to select a subset of the rows in the WHERE clause of your SQL query, so that you're only executing the expensive distance calculation on that subset of rows rather than against the entire 200k records in your table. The method is described in this article on Movable Type (with PHP code examples). Then you can include the Haversine calculation in your query against that subset to calculate the actual distances, and factor in the HAVING clause at that point.

    It's the bounding box that helps your performance, because it means you're only doing the expensive distance calculation on a small subset of your data. This is effectively the same method that Patrick has suggested, but the Movable Type link has extensive explanations of the method, as well as PHP code that you can use to build the bounding box and your SQL query.

    EDIT

    If you don't think haversine is accurate enough, then there's also the Vincenty formula.

    //  Vincenty formula to calculate great circle distance between 2 locations expressed as Lat/Long in KM
    
    function VincentyDistance($lat1,$lat2,$lon1,$lon2){
        $a = 6378137 - 21 * sin($lat1);
        $b = 6356752.3142;
        $f = 1/298.257223563;
    
        $p1_lat = $lat1/57.29577951;
        $p2_lat = $lat2/57.29577951;
        $p1_lon = $lon1/57.29577951;
        $p2_lon = $lon2/57.29577951;
    
        $L = $p2_lon - $p1_lon;
    
        $U1 = atan((1-$f) * tan($p1_lat));
        $U2 = atan((1-$f) * tan($p2_lat));
    
        $sinU1 = sin($U1);
        $cosU1 = cos($U1);
        $sinU2 = sin($U2);
        $cosU2 = cos($U2);
    
        $lambda = $L;
        $lambdaP = 2*M_PI;
        $iterLimit = 20;
    
        while(abs($lambda-$lambdaP) > 1e-12 && $iterLimit>0) {
            $sinLambda = sin($lambda);
            $cosLambda = cos($lambda);
            $sinSigma = sqrt(($cosU2*$sinLambda) * ($cosU2*$sinLambda) + ($cosU1*$sinU2-$sinU1*$cosU2*$cosLambda) * ($cosU1*$sinU2-$sinU1*$cosU2*$cosLambda));
    
            //if ($sinSigma==0){return 0;}  // co-incident points
            $cosSigma = $sinU1*$sinU2 + $cosU1*$cosU2*$cosLambda;
            $sigma = atan2($sinSigma, $cosSigma);
            $alpha = asin($cosU1 * $cosU2 * $sinLambda / $sinSigma);
            $cosSqAlpha = cos($alpha) * cos($alpha);
            $cos2SigmaM = $cosSigma - 2*$sinU1*$sinU2/$cosSqAlpha;
            $C = $f/16*$cosSqAlpha*(4+$f*(4-3*$cosSqAlpha));
            $lambdaP = $lambda;
            $lambda = $L + (1-$C) * $f * sin($alpha) * ($sigma + $C*$sinSigma*($cos2SigmaM+$C*$cosSigma*(-1+2*$cos2SigmaM*$cos2SigmaM)));
        }
    
        $uSq = $cosSqAlpha*($a*$a-$b*$b)/($b*$b);
        $A = 1 + $uSq/16384*(4096+$uSq*(-768+$uSq*(320-175*$uSq)));
        $B = $uSq/1024 * (256+$uSq*(-128+$uSq*(74-47*$uSq)));
    
        $deltaSigma = $B*$sinSigma*($cos2SigmaM+$B/4*($cosSigma*(-1+2*$cos2SigmaM*$cos2SigmaM)- $B/6*$cos2SigmaM*(-3+4*$sinSigma*$sinSigma)*(-3+4*$cos2SigmaM*$cos2SigmaM)));
    
        $s = $b*$A*($sigma-$deltaSigma);
        return $s/1000;
    }
    
    
    echo VincentyDistance($lat1,$lat2,$lon1,$lon2);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 MATLAB动图问题
  • ¥15 【提问】基于Invest的水源涵养
  • ¥20 微信网友居然可以通过vx号找到我绑的手机号
  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题