drslez4322 2019-06-21 15:25
浏览 68
已采纳

提高涉及联合的MYSQL QUERY的性能

Have a Golang utility which is able to decrease data points per day in a table with historic data.

The records range from 20 to 400 records per day. totally there are a minimum of 100 million records.

The utility is able to trim it down to n records per day prior to a given date. (n can range from 1 to 300 records per day)

The method I am using is as follows:

STEP 1:

CREATE TABLE main_table_tmp LIKE main_table;

STEP 2:

ALTER TABLE main_table_tmp ADD COLUMN timekey INT;

STEP 3:

INSERT INTO main_table_tmp 
SELECT * FROM (
  SELECT *,FLOOR(UNIX_TIMESTAMP(column_name)/((1440/2)*60)) AS timekey 
  FROM main_table
  WHERE column_name <= '2018-01-01' 
  GROUP BY timekey
) m 
UNION ALL 
(SELECT * ,0 As timekey FROM main_table where column_name > 'date') ;

STEP 4:

ALTER TABLE main_table_tmp DROP COLUMN timekey;

DROP TABLE maintable;

RENAME TABLE maintable_tmp TO maintable;

I am achieving the above using golang.

func somefuncname(){

  ---- 
  ----
  ----
  q := "CREATE TABLE " + *tablename + "_tmp LIKE " + *tablename + ";"
  rows, err := db.Query(q)
  if err != nil {
  fmt.Println(err)
  }
//--ALTER ADD timekey
//--INSERT INTO SELECT *....
//--ALTER DROP timekey ,DROP table and rename

}

The current response time of this query is very slow

Some of the Results: Total Records : 2 million
Execution Time: 180 seconds

This is on a 16Gb RAM CPU It is very slow when it is deployed on a low grade system

Steps I have took to resolve this:

  1. Looked into indexes of all the tables. Tried removing the index and running the utility. Removing indexes made the utility faster by 5 seconds which is also not much.

  2. Executed the utility in stages: if total records crosses more than 1 million then run the utility 1 million at a time

But after all these efforts looks like the main problem is in the query itself.

It is just not fast enough. I just need a way to increase the efficiency of the query

Any help appreciated, thank you guys!!

  • 写回答

1条回答 默认 最新

  • duanguochi6194 2019-06-21 16:15
    关注

    Why are we adding timekey and then dropping it? Adding it to an empty table is fast, but dropping it from a table after it's populated, that's like an extra copy of the table. That's unnecessary work, if we don't need it.

    We can do a GROUP BY on an expression; that expression doesn't have to appear in the SELECT list., for example:

    SELECT t.*
      FROM main_table t
     WHERE t.column_name <= '2018-01-01'
     GROUP 
        BY FLOOR(UNIX_TIMESTAMP(t.column_name)/((1440/2)*60))
    

    (Note that this query will cause an error if ONLY_FULL_GROUP_BY is included in sql_mode; that disables a MySQL-specific extension which allows the query to run.)

    Without some table definitions (including storage engine, column datatypes, indexes) and without EXPLAIN output, we're just guessing.

    But some suggestions:

    Drop the secondary indexes on the empty table being populated, and add them after the table is loaded.

    I'd avoid the UNION. Given that one of the SELECT statements has a predicate on column_name and the other has a predicate on an entirely different column date, we do want to separate SELECT statements.

    CREATE TABLE main_table_tmp LIKE main_table
    ;
    
    -- for performance, remove secondary indexes, leave just the cluster index
    ALTER TABLE main_table_tmp 
        DROP INDEX noncluster_index_1
      , DROP INDEX noncluster_index_2
      , ...
    ;
    
    -- for performance, have a suitable index available on main_table 
    -- with `column_name` as the leading column
    INSERT INTO main_table_tmp 
    SELECT h.*
      FROM main_table h
     WHERE h.column_name <= '2018-01-01'
     GROUP 
        BY FLOOR(UNIX_TIMESTAMP(h.column_name)/((1440/2)*60))
    ;
    
    -- for performance, have a suitable index available on main_table
    -- with `date` as the leading column
    INSERT INTO main_table_tmp
    SELECT c.*
      FROM main_table
     WHERE c.date > '????-??-??'
    ;
    
    -- add secondary indexes 
    ALTER TABLE maint_table_tmp
        ADD UNIQUE INDEX noncluster_index_1 (fee,fi,fo)
      , ADD INDEX noncluster_index_2 (fum)
      , ...
    ;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 Centos7 / PETGEM
  • ¥15 csmar数据进行spss描述性统计分析
  • ¥15 各位请问平行检验趋势图这样要怎么调整?说标准差差异太大了
  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题
  • ¥15 wpf界面一直接收PLC给过来的信号,导致UI界面操作起来会卡顿
  • ¥15 init i2c:2 freq:100000[MAIXPY]: find ov2640[MAIXPY]: find ov sensor是main文件哪里有问题吗
  • ¥15 运动想象脑电信号数据集.vhdr
  • ¥15 三因素重复测量数据R语句编写,不存在交互作用
  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗