drslez4322 2019-06-21 15:25
浏览 68
已采纳

提高涉及联合的MYSQL QUERY的性能

Have a Golang utility which is able to decrease data points per day in a table with historic data.

The records range from 20 to 400 records per day. totally there are a minimum of 100 million records.

The utility is able to trim it down to n records per day prior to a given date. (n can range from 1 to 300 records per day)

The method I am using is as follows:

STEP 1:

CREATE TABLE main_table_tmp LIKE main_table;

STEP 2:

ALTER TABLE main_table_tmp ADD COLUMN timekey INT;

STEP 3:

INSERT INTO main_table_tmp 
SELECT * FROM (
  SELECT *,FLOOR(UNIX_TIMESTAMP(column_name)/((1440/2)*60)) AS timekey 
  FROM main_table
  WHERE column_name <= '2018-01-01' 
  GROUP BY timekey
) m 
UNION ALL 
(SELECT * ,0 As timekey FROM main_table where column_name > 'date') ;

STEP 4:

ALTER TABLE main_table_tmp DROP COLUMN timekey;

DROP TABLE maintable;

RENAME TABLE maintable_tmp TO maintable;

I am achieving the above using golang.

func somefuncname(){

  ---- 
  ----
  ----
  q := "CREATE TABLE " + *tablename + "_tmp LIKE " + *tablename + ";"
  rows, err := db.Query(q)
  if err != nil {
  fmt.Println(err)
  }
//--ALTER ADD timekey
//--INSERT INTO SELECT *....
//--ALTER DROP timekey ,DROP table and rename

}

The current response time of this query is very slow

Some of the Results: Total Records : 2 million
Execution Time: 180 seconds

This is on a 16Gb RAM CPU It is very slow when it is deployed on a low grade system

Steps I have took to resolve this:

  1. Looked into indexes of all the tables. Tried removing the index and running the utility. Removing indexes made the utility faster by 5 seconds which is also not much.

  2. Executed the utility in stages: if total records crosses more than 1 million then run the utility 1 million at a time

But after all these efforts looks like the main problem is in the query itself.

It is just not fast enough. I just need a way to increase the efficiency of the query

Any help appreciated, thank you guys!!

  • 写回答

1条回答 默认 最新

  • duanguochi6194 2019-06-21 16:15
    关注

    Why are we adding timekey and then dropping it? Adding it to an empty table is fast, but dropping it from a table after it's populated, that's like an extra copy of the table. That's unnecessary work, if we don't need it.

    We can do a GROUP BY on an expression; that expression doesn't have to appear in the SELECT list., for example:

    SELECT t.*
      FROM main_table t
     WHERE t.column_name <= '2018-01-01'
     GROUP 
        BY FLOOR(UNIX_TIMESTAMP(t.column_name)/((1440/2)*60))
    

    (Note that this query will cause an error if ONLY_FULL_GROUP_BY is included in sql_mode; that disables a MySQL-specific extension which allows the query to run.)

    Without some table definitions (including storage engine, column datatypes, indexes) and without EXPLAIN output, we're just guessing.

    But some suggestions:

    Drop the secondary indexes on the empty table being populated, and add them after the table is loaded.

    I'd avoid the UNION. Given that one of the SELECT statements has a predicate on column_name and the other has a predicate on an entirely different column date, we do want to separate SELECT statements.

    CREATE TABLE main_table_tmp LIKE main_table
    ;
    
    -- for performance, remove secondary indexes, leave just the cluster index
    ALTER TABLE main_table_tmp 
        DROP INDEX noncluster_index_1
      , DROP INDEX noncluster_index_2
      , ...
    ;
    
    -- for performance, have a suitable index available on main_table 
    -- with `column_name` as the leading column
    INSERT INTO main_table_tmp 
    SELECT h.*
      FROM main_table h
     WHERE h.column_name <= '2018-01-01'
     GROUP 
        BY FLOOR(UNIX_TIMESTAMP(h.column_name)/((1440/2)*60))
    ;
    
    -- for performance, have a suitable index available on main_table
    -- with `date` as the leading column
    INSERT INTO main_table_tmp
    SELECT c.*
      FROM main_table
     WHERE c.date > '????-??-??'
    ;
    
    -- add secondary indexes 
    ALTER TABLE maint_table_tmp
        ADD UNIQUE INDEX noncluster_index_1 (fee,fi,fo)
      , ADD INDEX noncluster_index_2 (fum)
      , ...
    ;
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 PointNet++的onnx模型只能使用一次
  • ¥20 西南科技大学数字信号处理
  • ¥15 有两个非常“自以为是”烦人的问题急期待大家解决!
  • ¥30 STM32 INMP441无法读取数据
  • ¥15 R语言绘制密度图,一个密度曲线内fill不同颜色如何实现
  • ¥100 求汇川机器人IRCB300控制器和示教器同版本升级固件文件升级包
  • ¥15 用visualstudio2022创建vue项目后无法启动
  • ¥15 x趋于0时tanx-sinx极限可以拆开算吗
  • ¥500 把面具戴到人脸上,请大家贡献智慧,别用大模型回答,大模型的答案没啥用
  • ¥15 任意一个散点图自己下载其js脚本文件并做成独立的案例页面,不要作在线的,要离线状态。