duangutang3940 2013-09-03 12:50

已采纳

使用GROUP BY优化MySQL查询时遇到问题... HAVING

I'm trying to optimize quickly optimize the search functionality of some outdated forum software written in PHP. I've got my work down to a query that looks like this:

SELECT thread.threadid
FROM thread AS thread
INNER JOIN word AS word ON (word.title LIKE 'word1' OR word.title LIKE 'word2')
INNER JOIN postindex AS postindex ON (postindex.wordid = word.wordid)
INNER JOIN post AS postquery ON (postquery.postid = postindex.postid)
WHERE thread.threadid = postquery.threadid
GROUP BY thread.threadid
HAVING COUNT(DISTINCT word.wordid) = 2
LIMIT 25;

word1 and word2 are examples; there could be any number of words. The number at the very end of the query is the total number of words. The idea is that a thread most contain all words in the search query, spread out over any number of posts.

This query often exceeds 60 seconds with only two words, and times out. I'm stumped; I can't figure out how to further optimize this horrid search engine.

As far as I can tell, everything is indexed properly, and I've run ANALYZE recently. Most of the database is running on InnoDB. Here's the output of EXPLAIN:

+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| id | select_type | table     | type   | possible_keys                                                                          | key     | key_len | ref                          | rows | Extra                                                     |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | word      | range  | PRIMARY,title                                                                          | title   | 150     | NULL                         |    2 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | postindex | ref    | wordid,temp_ix                                                                         | temp_ix | 4       | database1.word.wordid        |    3 | Using index condition                                     |
|  1 | SIMPLE      | postquery | eq_ref | PRIMARY,threadid,showthread                                                            | PRIMARY | 4       | database1.postindex.postid   |    1 | NULL                                                      |
|  1 | SIMPLE      | thread    | eq_ref | PRIMARY,forumid,postuserid,pollid,title,lastpost,dateline,prefixid,tweeted,firstpostid | PRIMARY | 4       | database1.postquery.threadid |    1 | Using index                                               |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+

Update

LIMIT 25 doesn't seem to be helping much. It shaves off maybe second from a query that normally returns hundreds of results.

Clarification

The part that's slowing down MySQL is the GROUP BY ... HAVING ... bit. With GROUP BY, the LIMIT is pretty much useless for improving performance. Without GROUP BY, and as long as the LIMIT remains, the queries are quite speedy.

SQL Info

Output of SHOW CREATE TABLE postindex;:

CREATE TABLE `postindex` (
  `wordid` int(10) unsigned NOT NULL DEFAULT '0',
  `postid` int(10) unsigned NOT NULL DEFAULT '0',
  `intitle` smallint(5) unsigned NOT NULL DEFAULT '0',
  `score` smallint(5) unsigned NOT NULL DEFAULT '0',
  UNIQUE KEY `wordid` (`wordid`,`postid`),
  KEY `temp_ix` (`wordid`),
  KEY `postid` (`postid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

I didn't make the table, so I have no idea why there's a duplicate index on wordid; however, I'm not willing to delete it, since this is ancient, fickle software.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

dongying7847 2013-09-03 13:26

关注

You can try several rewrites and compare execution plan and times.

Using 2 EXISTS subqueries (one for each word to be checked):

SELECT t.threadid
FROM thread AS t
WHERE EXISTS
      ( SELECT 1
        FROM post AS p
          JOIN postindex AS pi
            ON pi.postid = p.postid
          JOIN word AS w
            ON pi.wordid = w.wordid
        WHERE w.title = 'word1'
          AND t.threadid = p.threadid
      )
  AND EXISTS
      ( SELECT 1
        FROM post AS p
          JOIN postindex AS pi
            ON pi.postid = p.postid
          JOIN word AS w
            ON pi.wordid = w.wordid
        WHERE w.title = 'word2'
          AND t.threadid = p.threadid
      ) ;

Using one EXISTS subquery:

SELECT t.threadid
FROM thread AS t
WHERE EXISTS
      ( SELECT 1
        FROM post AS p1
          JOIN postindex AS pi1
            ON  pi1.postid = p1.postid
          JOIN word AS w1
            ON  w1.wordid = pi1.wordid
            AND w1.title = 'word1'

          JOIN post AS p2
            ON  p2.threadid = p1.threadid
          JOIN postindex AS pi2
            ON  pi2.postid = p2.postid
          JOIN word AS w2
            ON  w2.wordid = pi2.wordid
            AND w2.title = 'word2'

        WHERE t.threadid = p1.threadid
          AND t.threadid = p2.threadid
      ) ;

A single query with many joins and GROUP BY only to remove the duplicate threadid:

SELECT t.threadid
FROM thread AS t

  JOIN post AS p1
    ON  p1.threadid = t.threadid
  JOIN postindex AS pi1
    ON  pi1.postid = p1.postid
  JOIN word AS w1
    ON  w1.wordid = pi1.wordid
    AND w1.title = 'word1'

  JOIN post AS p2
    ON  p1.threadid = t.threadid
  JOIN postindex AS pi2
    ON  pi2.postid = p2.postid
  JOIN word AS w2
    ON  w2.wordid = pi2.wordid
    AND w2.title = 'word2'

WHERE p1.threadid = p2.threadid        -- this line is redundant
GROUP BY t.threadid ;

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(1条)

报告相同问题？

关注问题

关于mysql中group by和having使用未达到预期的问题 mysql
2020-03-11 10:23

回答 5 已采纳我之前遇到过跟你一样的问题，不用子查询解决不了，having的执行顺序在group by之后，在执行到having时每个id就只剩一条数据了，试一下下面这个sql。 ``` select * fr
mysql GROUP BY 后 having 无数据 mysql sql
2020-01-10 16:53

回答 2 已采纳 having 后面跟聚合函数，不能放列。意思事 min（sort)>2 可以。但 min(sort)>sort 不可以。
antd Checkbox.group react.js 有问必答
2021-07-05 09:14

回答 1 已采纳 el-checkbox-group复选框组，v-model绑定的值必须是[]，不能是字符串
【mysql系列】mysql group by 执行原理及千万级别count 查询优化
2023-11-17 17:30

蓝胖子的编程梦的博客大家好，我是蓝胖子,前段时间mysql经常碰到慢查询报警，我们线上的慢sql阈值是1s，出现报警的表数据有 7000多万，经常出现报警的是一个group by的count查询，于是便开始着手优化这块，遂有此篇，记录下自己优化过程...
关于mysql中使用sql语句的group by语句后查询元组丢失的问题 mysql
2022-05-12 18:34

回答 2 已采纳因为你group by了cno，每个cno就只会取一条，然后其他字段随机取一行。这个语法在目前的标准sql语法里是不对的
mysql大数据量查询 count groupby 查询无结果 mysql
2022-04-01 10:23

回答 2 已采纳把order by去掉再试试,估计你这个数据量太大,src不同的值过多,临时表空间已经没法支撑排序了,建议得到group by 的结果后再排序,不要放在一起处理
关于HAVING子句是否必须与GROUP BY子句使用 mysql sql 有问必答
2021-11-16 20:59

回答 1 已采纳 having不能单独使用，必须与group by一起使用。
MySQL查询优化之order by 、 group by与分页查询优化
2022-09-07 18:04

流烟默的博客 order by 子句尽量使用index方式排序(即using index)，避免使用filesort方式排序(即using filesort)。Index方式效率高，它指MySQL扫描索引本身完成排序，filesort则效率低。 **order by满足两种情况，会使用 index ...
sql查询语句中group by遇到一些问题 sqlserver 数据库
2022-04-30 18:51

回答 1 已采纳你这应该是严格模式记住一个原则在这种分组的需求中，出现在select中的字段，要和group by中的相同，select中多出来的字段要包含在聚合函数中，例如select a,b,sum(c)from
关于Mysql8 FULL_GROUP_BY的问题 mysql 有问必答
2021-12-31 21:05

回答 3 已采纳像你上面这个语句,正确的写法应该是 select age,count(*) as cou from stu group by age; 另外这个报错"ERROR 1046 (3D000): No
mysql5.7 order by desc group by应用问题 mysql
2021-06-01 17:29

回答 2 已采纳 mysql5.7对子查询order by做了优化，你这么写会被忽略掉，5.6还是可以的，5.7要改写：子查询加一个limit子句或者使用聚合函数获取数据再关联原表的写法
apache-hive-2.3.9-bin.tar大数据HIVE.zip
2022-10-08 15:56

它支持SELECT、INSERT、UPDATE、DELETE等基本操作，以及JOIN、GROUP BY、HAVING等复杂查询。 4. **编译与执行计划**：Hive将HQL语句转换为MapReduce任务，或者在更现代的Hadoop版本中，转换为Tez或Spark任务。这...
mysql group by 千万_Mysql 如何提升 Group by + Having 型子查询的查询速度（千万级别数据）...
2021-02-01 16:03

木易movie的博客 abcfyk2015-03-18 18:55:18 +08:00@jk2r类似这种：CREATE TABLE `outDetail`{`id` int(11) NOT NULL AUTO_INCREMENT,`outId` int(11) NOT NULL,`skuId` varchar(30) NOT NULL,`outDate` datetime NOT NULL,`outQty` ...
mysql 面试题.pdf
2024-05-06 11:36

3. **解释如何在一个查询中使用GROUP BY和HAVING子句。** - 示例： ```sql SELECT department, COUNT(*) AS num_employees FROM employees GROUP BY department HAVING num_employees > 10; ``` 4. **描述...
MySQL 中的 GROUP BY 和 HAVING 子句：特性、用法与注意事项
2024-09-16 22:02

我爱娃哈哈的博客 GROUP BY 和 HAVING 子句是 MySQL 中非常有用的工具，但在使用时需要注意它们的特性和用法，以及一些注意事项。只有正确地使用这些子句，才能充分发挥它们的优势，高效地处理和分析数据。GROUP BY 子句注意事项。...
没有解决我的问题, 去提问

悬赏问题

¥15 做个有关计算的小程序
¥15 MPI读取tif文件无法正常给各进程分配路径
¥15 如何用MATLAB实现以下三个公式（有相互嵌套）
¥30 关于#算法#的问题：运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题求各位帮我解答一下
¥15 setInterval 页面闪烁，怎么解决
¥15 如何让企业微信机器人实现消息汇总整合
¥50 关于#ui#的问题：做yolov8的ui界面出现的问题
¥15 如何用Python爬取各高校教师公开的教育和工作经历
¥15 TLE9879QXA40 电机驱动
¥20 对于工程问题的非线性数学模型进行线性化

码龄粉丝数原力等级 --

使用GROUP BY优化MySQL查询时遇到问题... HAVING

Update

Clarification

SQL Info

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

使用GROUP BY优化MySQL查询时遇到问题... HAVING

Update

Clarification

SQL Info

2条回答 默认 最新

悬赏问题

2条回答默认最新