I've been researching this forever and I am starting to think I can't make this any more efficient, but I wanted to ask if anyone had any tips.
I'm running a query on millions of records to find all the x,y,z coordinates (these are stars) along a linear column from system a to system b with a given radius. I'm running through PHP with a lot of other work being done on the result set. I get results from the script in about 16 seconds. The query delay is about 7 of those 16 seconds.
The basic query logic is:
SELECT name, coordinates, and distance from end point
FROM stars
WHERE all stars are in a column of given radius between start and end points
ORDER BY distance from end point DESC
The where clause requires two separate calculations, they are this:
Where calculation 1:
Calculate if the stars are within the space of the column using constants and x,y,z
Where calculation 2:
Limit the column radius to a given figure.
(This where clause also performs similar calculations with the same constants and x,y,z.)
The math formulas in the where clauses can't really be changed, they are the formula needed for columnar calculation in 3D space.
The order by at the end of the query is absolutely necessary because the result set is too large for my script to hold in memory. I have to work with it in the proper order in the script.
The query is easiest to read as defined prior to variable substitution:
SELECT
name,
x,
y,
z,
SQRT(
pow(`x`-" . $bx . ",2)+
pow(`y`-" . $by . ",2)+
pow(`z`-" . $bz . ",2)
) d
FROM
stars
WHERE
(((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) between 0 and 1
AND
SQRT(((($ax + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cx))-`x`)*(($ax + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cx))-`x`))+((($ay + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cy))-`y`)*(($ay + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cy))-`y`))+((($az + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cz))-`z`)*(($az + ((((`x`*$cx+`y`*$cy+`z`*$cz)-($constant_1))/($constant_2)) * $cz))-`z`)))
<=$radius
ORDER BY
SQRT(
pow(`x`-" . $bx . ",2)+
pow(`y`-" . $by . ",2)+
pow(`z`-" . $bz . ",2)
) DESC
The final query run on the database looks like this: (For simplicity, I'm using sample data where a lot of the constants are 0.)
SELECT
name,
x,
y,
z,
SQRT( pow(`x`-25.21875,2)+ pow(`y`--20.90625,2)+ pow(`z`-25899.96875,2) ) d
FROM
stars
WHERE
(((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308))
between 0 and 1
AND
SQRT((((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * 25.21875))-`x`)*((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * 25.21875))-`x`))+(((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * -20.90625))-`y`)*((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * -20.90625))-`y`))+(((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * 25899.96875))-`z`)*((0 + ((((`x`*25.21875+`y`*-20.90625+`z`*25899.96875)-(0))/(670809454.308)) * 25899.96875))-`z`)))
<=600
ORDER BY
SQRT( pow(`x`-25.21875,2)+ pow(`y`--20.90625,2)+ pow(`z`-25899.96875,2) ) DESC
My table definition looks like this:
CREATE TABLE IF NOT EXISTS `stars` (
`localkey` bigint(20) NOT NULL AUTO_INCREMENT,
`id` bigint(20) NOT NULL,
`x` double NOT NULL,
`y` double NOT NULL,
`z` double NOT NULL,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`localkey`),
UNIQUE KEY `id` (`id`),
KEY `x` (`x`),
KEY `y` (`y`),
KEY `z` (`z`),
KEY `xyz` (`x`,`y`,`z`),
KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The explain for the query results indicates no index usage and an extra of:
extra: Using where; Using filesort;
What I've tried so far:
- Adjusting various data types to optimize memory usage and indexing (Even though my math makes it unlikely indexes will ever be used)
- Using a PHP loop and multiple smaller queries instead of this one huge one (It took longer with multiple queries.)
- Copying to a memory table before running query (Table is too large to fit in memory)
- Copying only part of the table (localkey, x, y, z) to memory. (It fit, but left so little of max_heap_size for other processes it wasn't worth it.)
Are there other options I'm missing?
Thanks!