dqvrlgi3247 2014-09-29 04:27
浏览 35
已采纳

Mysql选择查询性能变差

I got a mysql query that selects all clicks for each hour of a day. This query worked good till we have alot of click entries in our database. Now it needs sometimes several seconds (up to 9!) to request the datas...

The query is:

SELECT h.clickHour, COUNT(clicktime) AS c
      FROM ( SELECT 0 AS clickHour
             UNION ALL SELECT 1
             UNION ALL SELECT 2
             UNION ALL SELECT 3
             UNION ALL SELECT 4
             UNION ALL SELECT 5
             UNION ALL SELECT 6
             UNION ALL SELECT 7
             UNION ALL SELECT 8
             UNION ALL SELECT 9
             UNION ALL SELECT 10
             UNION ALL SELECT 11
             UNION ALL SELECT 12
             UNION ALL SELECT 13
             UNION ALL SELECT 14
             UNION ALL SELECT 15
             UNION ALL SELECT 16
             UNION ALL SELECT 17
             UNION ALL SELECT 18
             UNION ALL SELECT 19
             UNION ALL SELECT 20
             UNION ALL SELECT 21
             UNION ALL SELECT 22
             UNION ALL SELECT 23 ) AS h
    INNER JOIN links l ON l.user_id = 1
    LEFT OUTER
      JOIN clicks
        ON EXTRACT(HOUR FROM clicks.clicktime) = h.clickHour
          AND DATE(clicks.clicktime) = '2014-09-21'
          AND clicks.link_id = l.id
    GROUP
        BY h.clickHour

I got these unions because i need clicks for each hour also empty hours... Please help!

Ok so we are talking about 0 to several thousand rows for the table clicks. The click time is saved as a timestamp and every click got a unique id. I see that the union thing is bad and i have to change it.

What i try now is to select all clicks of a day grouped by HOUR(clicktime): But when i do so I get too many results like 10x then it should be.

  • 写回答

2条回答 默认 最新

  • dsy19890123 2014-09-29 05:20
    关注

    I'd rewrite the query like this:

    SELECT h.clickHour
         , IFNULL(d.clickCount,0) AS c
      FROM ( SELECT 0 AS clickHour UNION ALL SELECT  1 UNION ALL SELECT  2
               UNION ALL SELECT  3 UNION ALL SELECT  4 UNION ALL SELECT  5
               UNION ALL SELECT  6 UNION ALL SELECT  7 UNION ALL SELECT  8
               UNION ALL SELECT  9 UNION ALL SELECT 10 UNION ALL SELECT 11
               UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14
               UNION ALL SELECT 15 UNION ALL SELECT 16 UNION ALL SELECT 17
               UNION ALL SELECT 18 UNION ALL SELECT 19 UNION ALL SELECT 20
               UNION ALL SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23 
           ) h
      LEFT
      JOIN ( SELECT EXTRACT(HOUR FROM c.clicktime) AS clickHour
                  , SUM(1) AS clickCount
               FROM clicks c
               JOIN links l
                 ON l.user_id = 1
                AND l.id = c.link_id
              WHERE c.clicktime >= '2014-09-21'
                AND c.clicktime <  '2014-09-21' + INTERVAL 1 DAY 
              GROUP BY EXTRACT(HOUR FROM c.clicktime)
           ) d
        ON d.clickHour = h.clickHour
    

    The approach here is to get the inline view query d to return a maximum of 24 rows. This cranks through the clicks table to get the counts. W're going to defer the join operation to the fixed set of 24 rows until after we have calculated the hourly counts. (The join to h is there only to get rows with zero counts returned, which would otherwise just be "missing" rows.)

    You can test the performance of the inline view query d, and of the entire query, I suspect there won't be much difference. The cost of materializing the inline view h isn't that much (there's some overhead, but it's very likely that will use the Memory storage engine; it's small enough and it should be simple integer datatype.) And that join operation of 24 rows to 24 rows won't be that expensive, even without any indexes available.

    I suspect that the majority of time will be in materializing the derived table d.

    We're going to want an index with a leading column of clickDate, so that we can use a more efficient index range scan operation, to avoid evaluating expressions for every flipping row in the table.

    I changed this predicate: DATE(clickTime) = '2014-09-21' into a predicates that reference the bare column, this enables MySQL to consider an efficient range scan operation on the clickTime column, (to quickly eliminate a boatload of rows from consideration), rather than requiring that MySQL evaluate a function on every flipping row in the table.

    Some performance gain may be obtained by making covering indexes available on the clicks and links tables (so that the query can be satisfied from the indexes, without a need to visit pages in the underlying table.)

    At a minimum on the clicks table:

    ON clicks (clickTime, link_id)
    

    If id is unique (or primary key) on the links table, this index may not give any performance benefit:

    ON links (id, user_id)
    

    If a covering index used, the EXPLAIN output should show "Using index".

    I don't see a way around the "Using filesort" operation, not without adding a column to clicks table that stores the clickTime truncated to the hour. With a column like that, and an appropriate index, it's possible that we could get the GROUP BY operation optimized using the index, avoiding the "Using filesort" operation.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私
  • ¥15 ROS系统搭建请教(跨境电商用途)
  • ¥15 AIC3204的示例代码有吗,想用AIC3204测量血氧,找不到相关的代码。