drslez4322 2019-06-21 15:25
浏览 68
已采纳

提高涉及联合的MYSQL QUERY的性能

Have a Golang utility which is able to decrease data points per day in a table with historic data.

The records range from 20 to 400 records per day. totally there are a minimum of 100 million records.

The utility is able to trim it down to n records per day prior to a given date. (n can range from 1 to 300 records per day)

The method I am using is as follows:

STEP 1:

CREATE TABLE main_table_tmp LIKE main_table;

STEP 2:

ALTER TABLE main_table_tmp ADD COLUMN timekey INT;

STEP 3:

INSERT INTO main_table_tmp 
SELECT * FROM (
  SELECT *,FLOOR(UNIX_TIMESTAMP(column_name)/((1440/2)*60)) AS timekey 
  FROM main_table
  WHERE column_name <= '2018-01-01' 
  GROUP BY timekey
) m 
UNION ALL 
(SELECT * ,0 As timekey FROM main_table where column_name > 'date') ;

STEP 4:

ALTER TABLE main_table_tmp DROP COLUMN timekey;

DROP TABLE maintable;

RENAME TABLE maintable_tmp TO maintable;

I am achieving the above using golang.

func somefuncname(){

  ---- 
  ----
  ----
  q := "CREATE TABLE " + *tablename + "_tmp LIKE " + *tablename + ";"
  rows, err := db.Query(q)
  if err != nil {
  fmt.Println(err)
  }
//--ALTER ADD timekey
//--INSERT INTO SELECT *....
//--ALTER DROP timekey ,DROP table and rename

}

The current response time of this query is very slow

Some of the Results: Total Records : 2 million
Execution Time: 180 seconds

This is on a 16Gb RAM CPU It is very slow when it is deployed on a low grade system

Steps I have took to resolve this:

  1. Looked into indexes of all the tables. Tried removing the index and running the utility. Removing indexes made the utility faster by 5 seconds which is also not much.

  2. Executed the utility in stages: if total records crosses more than 1 million then run the utility 1 million at a time

But after all these efforts looks like the main problem is in the query itself.

It is just not fast enough. I just need a way to increase the efficiency of the query

Any help appreciated, thank you guys!!

  • 写回答

1条回答 默认 最新

  • duanguochi6194 2019-06-21 16:15
    关注

    Why are we adding timekey and then dropping it? Adding it to an empty table is fast, but dropping it from a table after it's populated, that's like an extra copy of the table. That's unnecessary work, if we don't need it.

    We can do a GROUP BY on an expression; that expression doesn't have to appear in the SELECT list., for example:

    SELECT t.*
      FROM main_table t
     WHERE t.column_name <= '2018-01-01'
     GROUP 
        BY FLOOR(UNIX_TIMESTAMP(t.column_name)/((1440/2)*60))
    

    (Note that this query will cause an error if ONLY_FULL_GROUP_BY is included in sql_mode; that disables a MySQL-specific extension which allows the query to run.)

    Without some table definitions (including storage engine, column datatypes, indexes) and without EXPLAIN output, we're just guessing.

    But some suggestions:

    Drop the secondary indexes on the empty table being populated, and add them after the table is loaded.

    I'd avoid the UNION. Given that one of the SELECT statements has a predicate on column_name and the other has a predicate on an entirely different column date, we do want to separate SELECT statements.

    CREATE TABLE main_table_tmp LIKE main_table
    ;
    
    -- for performance, remove secondary indexes, leave just the cluster index
    ALTER TABLE main_table_tmp 
        DROP INDEX noncluster_index_1
      , DROP INDEX noncluster_index_2
      , ...
    ;
    
    -- for performance, have a suitable index available on main_table 
    -- with `column_name` as the leading column
    INSERT INTO main_table_tmp 
    SELECT h.*
      FROM main_table h
     WHERE h.column_name <= '2018-01-01'
     GROUP 
        BY FLOOR(UNIX_TIMESTAMP(h.column_name)/((1440/2)*60))
    ;
    
    -- for performance, have a suitable index available on main_table
    -- with `date` as the leading column
    INSERT INTO main_table_tmp
    SELECT c.*
      FROM main_table
     WHERE c.date > '????-??-??'
    ;
    
    -- add secondary indexes 
    ALTER TABLE maint_table_tmp
        ADD UNIQUE INDEX noncluster_index_1 (fee,fi,fo)
      , ADD INDEX noncluster_index_2 (fum)
      , ...
    ;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行
  • ¥20 测距传感器数据手册i2c
  • ¥15 RPA正常跑,cmd输入cookies跑不出来