dtnmnw3697 2016-08-05 19:40
浏览 116

有没有办法在MySQL中优化复杂的数学查询?

I've been researching this forever and I am starting to think I can't make this any more efficient, but I wanted to ask if anyone had any tips.

I'm running a query on millions of records to find all the x,y,z coordinates (these are stars) along a linear column from system a to system b with a given radius. I'm running through PHP with a lot of other work being done on the result set. I get results from the script in about 16 seconds. The query delay is about 7 of those 16 seconds.

The basic query logic is:

SELECT name, coordinates, and distance from end point
FROM stars
WHERE all stars are in a column of given radius between start and end points
ORDER BY distance from end point DESC

The where clause requires two separate calculations, they are this:

Where calculation 1:

Calculate if the stars are within the space of the column using constants and x,y,z

Where calculation 2:

Limit the column radius to a given figure.
(This where clause also performs similar calculations with the same constants and x,y,z.)

The math formulas in the where clauses can't really be changed, they are the formula needed for columnar calculation in 3D space.

The order by at the end of the query is absolutely necessary because the result set is too large for my script to hold in memory. I have to work with it in the proper order in the script.

The query is easiest to read as defined prior to variable substitution:

SELECT
    name,
    x,
    y,
    z,
    SQRT(
        pow(`x`-" . $bx . ",2)+
        pow(`y`-" . $by . ",2)+
        pow(`z`-" . $bz . ",2)
    ) d
FROM
    stars
WHERE
    (((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) between 0 and 1
AND
    SQRT(((($ax + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cx))-`x`)*(($ax + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cx))-`x`))+((($ay + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cy))-`y`)*(($ay + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cy))-`y`))+((($az + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cz))-`z`)*(($az + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cz))-`z`)))
        <=$radius
ORDER BY
    SQRT(
        pow(`x`-" . $bx . ",2)+
        pow(`y`-" . $by . ",2)+
        pow(`z`-" . $bz . ",2)
    ) DESC

The final query run on the database looks like this: (For simplicity, I'm using sample data where a lot of the constants are 0.)

SELECT
    name, 
    x, 
    y, 
    z, 
    SQRT( pow(`x`-25.21875,2)+ pow(`y`--20.90625,2)+ pow(`z`-25899.96875,2) ) d
FROM
    stars
WHERE
    (((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) 
    between 0 and 1
AND
    SQRT((((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * 25.21875))-`x`)*((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * 25.21875))-`x`))+(((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * -20.90625))-`y`)*((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * -20.90625))-`y`))+(((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * 25899.96875))-`z`)*((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * 25899.96875))-`z`)))
    <=600
ORDER BY
    SQRT( pow(`x`-25.21875,2)+ pow(`y`--20.90625,2)+ pow(`z`-25899.96875,2) ) DESC

My table definition looks like this:

CREATE TABLE IF NOT EXISTS `stars` (
    `localkey` bigint(20) NOT NULL AUTO_INCREMENT,
    `id` bigint(20) NOT NULL,
    `x` double NOT NULL,
    `y` double NOT NULL,
    `z` double NOT NULL,
    `name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`localkey`),
UNIQUE KEY `id` (`id`),
KEY `x` (`x`),
KEY `y` (`y`),
KEY `z` (`z`),
KEY `xyz` (`x`,`y`,`z`),
KEY `name` (`name`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci 

The explain for the query results indicates no index usage and an extra of:

extra: Using where; Using filesort;

What I've tried so far:

  • Adjusting various data types to optimize memory usage and indexing (Even though my math makes it unlikely indexes will ever be used)
  • Using a PHP loop and multiple smaller queries instead of this one huge one (It took longer with multiple queries.)
  • Copying to a memory table before running query (Table is too large to fit in memory)
  • Copying only part of the table (localkey, x, y, z) to memory. (It fit, but left so little of max_heap_size for other processes it wasn't worth it.)

Are there other options I'm missing?

Thanks!

  • 写回答

4条回答 默认 最新

  • duanjiang7505 2016-08-05 19:45
    关注

    Assuming that only a smaller subset of your records will match, you can reduce the math load by doing a basic "rectangular" filtering first. e.g. there's no point in performing a full cartesian distance for EVERY record in the table, only to throw away most of them.

    A simple "box" boundary check is just a simple subtraction and comparison:

    SELECT ...
    FROM (
        SELECT ...
        WHERE (
            (abs($x_coord - x_coordinate) <= $max_distance)
         OR (abs($y_coord - y_coordinate) <= $max_distance)
        )
    ) AS square_filter
    WHERE ... full calculation here
    

    Of cousre, you're doing 3d positions, so it's a bit more complicated, but this should give you the basic idea.

    评论

报告相同问题?

悬赏问题

  • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
  • ¥500 52810做蓝牙接受端
  • ¥15 基于PLC的三轴机械手程序
  • ¥15 多址通信方式的抗噪声性能和系统容量对比
  • ¥15 winform的chart曲线生成时有凸起
  • ¥15 msix packaging tool打包问题
  • ¥15 finalshell节点的搭建代码和那个端口代码教程
  • ¥15 Centos / PETSc / PETGEM
  • ¥15 centos7.9 IPv6端口telnet和端口监控问题
  • ¥20 完全没有学习过GAN,看了CSDN的一篇文章,里面有代码但是完全不知道如何操作