I'm trying to optimize quickly optimize the search functionality of some outdated forum software written in PHP. I've got my work down to a query that looks like this:
SELECT thread.threadid
FROM thread AS thread
INNER JOIN word AS word ON (word.title LIKE 'word1' OR word.title LIKE 'word2')
INNER JOIN postindex AS postindex ON (postindex.wordid = word.wordid)
INNER JOIN post AS postquery ON (postquery.postid = postindex.postid)
WHERE thread.threadid = postquery.threadid
GROUP BY thread.threadid
HAVING COUNT(DISTINCT word.wordid) = 2
LIMIT 25;
word1
and word2
are examples; there could be any number of words. The number at the very end of the query is the total number of words. The idea is that a thread most contain all words in the search query, spread out over any number of posts.
This query often exceeds 60 seconds with only two words, and times out. I'm stumped; I can't figure out how to further optimize this horrid search engine.
As far as I can tell, everything is indexed properly, and I've run ANALYZE
recently. Most of the database is running on InnoDB. Here's the output of EXPLAIN
:
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | word | range | PRIMARY,title | title | 150 | NULL | 2 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | postindex | ref | wordid,temp_ix | temp_ix | 4 | database1.word.wordid | 3 | Using index condition |
| 1 | SIMPLE | postquery | eq_ref | PRIMARY,threadid,showthread | PRIMARY | 4 | database1.postindex.postid | 1 | NULL |
| 1 | SIMPLE | thread | eq_ref | PRIMARY,forumid,postuserid,pollid,title,lastpost,dateline,prefixid,tweeted,firstpostid | PRIMARY | 4 | database1.postquery.threadid | 1 | Using index |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
Update
LIMIT 25
doesn't seem to be helping much. It shaves off maybe second from a query that normally returns hundreds of results.
Clarification
The part that's slowing down MySQL is the GROUP BY ... HAVING ...
bit. With GROUP BY
, the LIMIT
is pretty much useless for improving performance. Without GROUP BY
, and as long as the LIMIT
remains, the queries are quite speedy.
SQL Info
Output of SHOW CREATE TABLE postindex;
:
CREATE TABLE `postindex` (
`wordid` int(10) unsigned NOT NULL DEFAULT '0',
`postid` int(10) unsigned NOT NULL DEFAULT '0',
`intitle` smallint(5) unsigned NOT NULL DEFAULT '0',
`score` smallint(5) unsigned NOT NULL DEFAULT '0',
UNIQUE KEY `wordid` (`wordid`,`postid`),
KEY `temp_ix` (`wordid`),
KEY `postid` (`postid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I didn't make the table, so I have no idea why there's a duplicate index on wordid; however, I'm not willing to delete it, since this is ancient, fickle software.