duangutang3940 2013-09-03 12:50
浏览 142
已采纳

使用GROUP BY优化MySQL查询时遇到问题... HAVING

I'm trying to optimize quickly optimize the search functionality of some outdated forum software written in PHP. I've got my work down to a query that looks like this:

SELECT thread.threadid
FROM thread AS thread
INNER JOIN word AS word ON (word.title LIKE 'word1' OR word.title LIKE 'word2')
INNER JOIN postindex AS postindex ON (postindex.wordid = word.wordid)
INNER JOIN post AS postquery ON (postquery.postid = postindex.postid)
WHERE thread.threadid = postquery.threadid
GROUP BY thread.threadid
HAVING COUNT(DISTINCT word.wordid) = 2
LIMIT 25;

word1 and word2 are examples; there could be any number of words. The number at the very end of the query is the total number of words. The idea is that a thread most contain all words in the search query, spread out over any number of posts.

This query often exceeds 60 seconds with only two words, and times out. I'm stumped; I can't figure out how to further optimize this horrid search engine.

As far as I can tell, everything is indexed properly, and I've run ANALYZE recently. Most of the database is running on InnoDB. Here's the output of EXPLAIN:

+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| id | select_type | table     | type   | possible_keys                                                                          | key     | key_len | ref                          | rows | Extra                                                     |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | word      | range  | PRIMARY,title                                                                          | title   | 150     | NULL                         |    2 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | postindex | ref    | wordid,temp_ix                                                                         | temp_ix | 4       | database1.word.wordid        |    3 | Using index condition                                     |
|  1 | SIMPLE      | postquery | eq_ref | PRIMARY,threadid,showthread                                                            | PRIMARY | 4       | database1.postindex.postid   |    1 | NULL                                                      |
|  1 | SIMPLE      | thread    | eq_ref | PRIMARY,forumid,postuserid,pollid,title,lastpost,dateline,prefixid,tweeted,firstpostid | PRIMARY | 4       | database1.postquery.threadid |    1 | Using index                                               |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+

Update

LIMIT 25 doesn't seem to be helping much. It shaves off maybe second from a query that normally returns hundreds of results.

Clarification

The part that's slowing down MySQL is the GROUP BY ... HAVING ... bit. With GROUP BY, the LIMIT is pretty much useless for improving performance. Without GROUP BY, and as long as the LIMIT remains, the queries are quite speedy.

SQL Info

Output of SHOW CREATE TABLE postindex;:

CREATE TABLE `postindex` (
  `wordid` int(10) unsigned NOT NULL DEFAULT '0',
  `postid` int(10) unsigned NOT NULL DEFAULT '0',
  `intitle` smallint(5) unsigned NOT NULL DEFAULT '0',
  `score` smallint(5) unsigned NOT NULL DEFAULT '0',
  UNIQUE KEY `wordid` (`wordid`,`postid`),
  KEY `temp_ix` (`wordid`),
  KEY `postid` (`postid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

I didn't make the table, so I have no idea why there's a duplicate index on wordid; however, I'm not willing to delete it, since this is ancient, fickle software.

  • 写回答

2条回答 默认 最新

  • dongying7847 2013-09-03 13:26
    关注

    You can try several rewrites and compare execution plan and times.

    Using 2 EXISTS subqueries (one for each word to be checked):

    SELECT t.threadid
    FROM thread AS t
    WHERE EXISTS
          ( SELECT 1
            FROM post AS p
              JOIN postindex AS pi
                ON pi.postid = p.postid
              JOIN word AS w
                ON pi.wordid = w.wordid
            WHERE w.title = 'word1'
              AND t.threadid = p.threadid
          )
      AND EXISTS
          ( SELECT 1
            FROM post AS p
              JOIN postindex AS pi
                ON pi.postid = p.postid
              JOIN word AS w
                ON pi.wordid = w.wordid
            WHERE w.title = 'word2'
              AND t.threadid = p.threadid
          ) ;
    

    Using one EXISTS subquery:

    SELECT t.threadid
    FROM thread AS t
    WHERE EXISTS
          ( SELECT 1
            FROM post AS p1
              JOIN postindex AS pi1
                ON  pi1.postid = p1.postid
              JOIN word AS w1
                ON  w1.wordid = pi1.wordid
                AND w1.title = 'word1'
    
              JOIN post AS p2
                ON  p2.threadid = p1.threadid
              JOIN postindex AS pi2
                ON  pi2.postid = p2.postid
              JOIN word AS w2
                ON  w2.wordid = pi2.wordid
                AND w2.title = 'word2'
    
            WHERE t.threadid = p1.threadid
              AND t.threadid = p2.threadid
          ) ;
    

    A single query with many joins and GROUP BY only to remove the duplicate threadid:

    SELECT t.threadid
    FROM thread AS t
    
      JOIN post AS p1
        ON  p1.threadid = t.threadid
      JOIN postindex AS pi1
        ON  pi1.postid = p1.postid
      JOIN word AS w1
        ON  w1.wordid = pi1.wordid
        AND w1.title = 'word1'
    
      JOIN post AS p2
        ON  p1.threadid = t.threadid
      JOIN postindex AS pi2
        ON  pi2.postid = p2.postid
      JOIN word AS w2
        ON  w2.wordid = pi2.wordid
        AND w2.title = 'word2'
    
    WHERE p1.threadid = p2.threadid        -- this line is redundant
    GROUP BY t.threadid ;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化