提高涉及联合的MYSQL QUERY的性能

Have a Golang utility which is able to decrease data points per day in a table with historic data.

The records range from 20 to 400 records per day. totally there are a minimum of 100 million records.

The utility is able to trim it down to n records per day prior to a given date. (n can range from 1 to 300 records per day)

The method I am using is as follows:

STEP 1:

CREATE TABLE main_table_tmp LIKE main_table;

STEP 2:

ALTER TABLE main_table_tmp ADD COLUMN timekey INT;

STEP 3:

INSERT INTO main_table_tmp 
SELECT * FROM (
  SELECT *,FLOOR(UNIX_TIMESTAMP(column_name)/((1440/2)*60)) AS timekey 
  FROM main_table
  WHERE column_name <= '2018-01-01' 
  GROUP BY timekey
) m 
UNION ALL 
(SELECT * ,0 As timekey FROM main_table where column_name > 'date') ;

STEP 4:

ALTER TABLE main_table_tmp DROP COLUMN timekey;

DROP TABLE maintable;

RENAME TABLE maintable_tmp TO maintable;

I am achieving the above using golang.

func somefuncname(){

  ---- 
  ----
  ----
  q := "CREATE TABLE " + *tablename + "_tmp LIKE " + *tablename + ";"
  rows, err := db.Query(q)
  if err != nil {
  fmt.Println(err)
  }
//--ALTER ADD timekey
//--INSERT INTO SELECT *....
//--ALTER DROP timekey ,DROP table and rename

}

The current response time of this query is very slow

Some of the Results: Total Records : 2 million
Execution Time: 180 seconds

This is on a 16Gb RAM CPU It is very slow when it is deployed on a low grade system

Steps I have took to resolve this:

Looked into indexes of all the tables. Tried removing the index and running the utility. Removing indexes made the utility faster by 5 seconds which is also not much.
Executed the utility in stages: if total records crosses more than 1 million then run the utility 1 million at a time

But after all these efforts looks like the main problem is in the query itself.

It is just not fast enough. I just need a way to increase the efficiency of the query

Any help appreciated, thank you guys!!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanguochi6194 2019-06-21 16:15
关注
Why are we adding timekey and then dropping it? Adding it to an empty table is fast, but dropping it from a table after it's populated, that's like an extra copy of the table. That's unnecessary work, if we don't need it.

We can do a GROUP BY on an expression; that expression doesn't have to appear in the SELECT list., for example:

SELECT t.* FROM main_table t WHERE t.column_name <= '2018-01-01' GROUP BY FLOOR(UNIX_TIMESTAMP(t.column_name)/((1440/2)*60))

(Note that this query will cause an error if ONLY_FULL_GROUP_BY is included in sql_mode; that disables a MySQL-specific extension which allows the query to run.)

Without some table definitions (including storage engine, column datatypes, indexes) and without EXPLAIN output, we're just guessing.

But some suggestions:

Drop the secondary indexes on the empty table being populated, and add them after the table is loaded.

I'd avoid the UNION. Given that one of the SELECT statements has a predicate on column_name and the other has a predicate on an entirely different column date, we do want to separate SELECT statements.

CREATE TABLE main_table_tmp LIKE main_table ; -- for performance, remove secondary indexes, leave just the cluster index ALTER TABLE main_table_tmp DROP INDEX noncluster_index_1 , DROP INDEX noncluster_index_2 , ... ; -- for performance, have a suitable index available on main_table -- with `column_name` as the leading column INSERT INTO main_table_tmp SELECT h.* FROM main_table h WHERE h.column_name <= '2018-01-01' GROUP BY FLOOR(UNIX_TIMESTAMP(h.column_name)/((1440/2)*60)) ; -- for performance, have a suitable index available on main_table -- with `date` as the leading column INSERT INTO main_table_tmp SELECT c.* FROM main_table WHERE c.date > '????-??-??' ; -- add secondary indexes ALTER TABLE maint_table_tmp ADD UNIQUE INDEX noncluster_index_1 (fee,fi,fo) , ADD INDEX noncluster_index_2 (fum) , ... ;
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

mysql大数据如何存储方便 java mysql
2019-12-28 10:42

回答 2 已采纳你修改过问题？500亿以上数据，根本不适合使用mysql这样的关系型数据库，应该用neo4j这种图数据库 https://baike.baidu.com/item/Neo4j/9952114?fr=
mysql数据同步到大数据后通过查询大数据实现报表的实时展示 big data java mysql 有问必答
2021-09-18 10:38

回答 4 已采纳你好像描述的有些问题吧？为什么查询大数据展示？你口中的大数据指定到底是什么？如果你是学习大数据（hadoop相关），使用java语言在做数仓项目，那么你可以参考一下这张图查询数据一般从数据库里面查
MySql的联合索引问题，求解 mysql
2019-05-16 11:22

回答 2 已采纳只有A用了这里就给你简单的解释一下mysql索引原理；还有就是你可以先了解一下Btree，这样更容易了解； 1.mysql在innodb的存储引擎下使用的是btree索引，包括主键的聚簇索引
调优攻略：10个提高MySQL性能的实用技巧
2023-10-23 20:00

Java程序员廖志伟的博客调优攻略：10个提高MySQL性能的实用技巧
Mysql插入大数据 mysql php
2013-09-28 04:38

回答 7 已采纳 Assuming that you are using InnoDB engine (which is default in most recent MySQL versions), you sh
mysql 俩表联合查询并排除指定数据 mysql
2021-01-10 19:24

回答 2 已采纳 select name from A where name !='6548' and name not in (select name from B where id ='1'); 或者： sel
mysql left join mysql
2022-05-09 17:32

回答 1 已采纳你的mysql版本如果是8.0之后的话可以采用开窗函数先处理b表select * ,row_number() over(partition by b.id) as b_num from b然后用b_
【MySQL】性能优化
2023-03-20 07:00

陈书予的博客这对于数据库查询以及在数据库中查找涉及的字段非常重要并可以提高查询速度。在进行数据库的设计和开发时，命名规范是非常重要的，可以有效地减少命名混乱、重名和不同的命名方式等问题，提高数据库的可维护性和易读...
mysql修改数据没有反应是为什么 mysql sql 大数据
2023-04-02 16:21

回答 4 已采纳先用select查一波，确定这个编号有数据，再update
php插入mysql_query获取id html mysql php
2019-01-11 15:19

回答 1 已采纳 use mysql_insert_id();, you find the doc here
虚拟机中安装mysql服务器失败找不到路径 hadoop mysql 大数据
2022-06-07 16:24

回答 2 已采纳你是要从自己挂载的镜像里面来安装吗？如果是这样的话你需要在yum的配置文件里面添加一个挂载路径的本地源，这样才可以正常工作的。可以参考： CentOS软件管理 - YUM
【大数据面试】MySQL面试题与答案
2023-12-20 17:36

话数Science的博客数据库中的事务是什么，MySQL中是怎么实现的 MySQL事务的特性? 数据库事务的隔离级别?解决了什么问题?默认事务隔离级别? 脏读，幻读，不可重复读的定义 MySQL怎么实现可重复读? 数据库第三范式和第四范式区别? ...
大量mysql数据查询，优化到秒出 java mysql 大数据
2022-12-07 17:18

回答 4 已采纳使用 instr函数试试看 ,查询 like '%121%' select * from test t where instr(t.requestdata,'121')> 0;
MySQL高性能及性能优化技巧
2021-10-25 22:18

最难不过坚持丶渊洁的博客 Mysql的高性能优化和架构设计. 索引设计，查询方式，mysql底层结构
大数据秋招面经之mysql系列
2020-10-06 10:23

wq17629260466的博客 4.普通索引和唯一索引性能比较5.mysql索引有哪些6.mysql联合索引是什么，以abc建索引，查abc,ac,bc会不会用到索引7.mysql慢查询日志总结：8.Hive，mysql性能优化9.mysql的join;10.leftjoin和rightjoin使用场景？11....
没有解决我的问题, 去提问

悬赏问题

¥20 软件测试决策法疑问求解答
¥15 win11 23H2删除推荐的项目，支持注册表等
¥15 matlab 用yalmip搭建模型，cplex求解，线性化处理的方法
¥15 qt6.6.3 基于百度云的语音识别不会改
¥15 关于#目标检测#的问题：大概就是类似后台自动检测某下架商品的库存，在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
¥15 神经网络怎么把隐含层变量融合到损失函数中？
¥15 lingo18勾选global solver求解使用的算法
¥15 全部备份安卓app数据包括密码，可以复制到另一手机上运行
¥20 测距传感器数据手册i2c
¥15 RPA正常跑，cmd输入cookies跑不出来

提高涉及联合的MYSQL QUERY的性能

1条回答 默认 最新

悬赏问题

1条回答默认最新