提高涉及联合的MYSQL QUERY的性能

Have a Golang utility which is able to decrease data points per day in a table with historic data.

The records range from 20 to 400 records per day. totally there are a minimum of 100 million records.

The utility is able to trim it down to n records per day prior to a given date. (n can range from 1 to 300 records per day)

The method I am using is as follows:

STEP 1:

CREATE TABLE main_table_tmp LIKE main_table;

STEP 2:

ALTER TABLE main_table_tmp ADD COLUMN timekey INT;

STEP 3:

INSERT INTO main_table_tmp 
SELECT * FROM (
  SELECT *,FLOOR(UNIX_TIMESTAMP(column_name)/((1440/2)*60)) AS timekey 
  FROM main_table
  WHERE column_name <= '2018-01-01' 
  GROUP BY timekey
) m 
UNION ALL 
(SELECT * ,0 As timekey FROM main_table where column_name > 'date') ;

STEP 4:

ALTER TABLE main_table_tmp DROP COLUMN timekey;

DROP TABLE maintable;

RENAME TABLE maintable_tmp TO maintable;

I am achieving the above using golang.

func somefuncname(){

  ---- 
  ----
  ----
  q := "CREATE TABLE " + *tablename + "_tmp LIKE " + *tablename + ";"
  rows, err := db.Query(q)
  if err != nil {
  fmt.Println(err)
  }
//--ALTER ADD timekey
//--INSERT INTO SELECT *....
//--ALTER DROP timekey ,DROP table and rename

}

The current response time of this query is very slow

Some of the Results: Total Records : 2 million
Execution Time: 180 seconds

This is on a 16Gb RAM CPU It is very slow when it is deployed on a low grade system

Steps I have took to resolve this:

Looked into indexes of all the tables. Tried removing the index and running the utility. Removing indexes made the utility faster by 5 seconds which is also not much.
Executed the utility in stages: if total records crosses more than 1 million then run the utility 1 million at a time

But after all these efforts looks like the main problem is in the query itself.

It is just not fast enough. I just need a way to increase the efficiency of the query

Any help appreciated, thank you guys!!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanguochi6194 2019-06-21 16:15
关注
Why are we adding timekey and then dropping it? Adding it to an empty table is fast, but dropping it from a table after it's populated, that's like an extra copy of the table. That's unnecessary work, if we don't need it.

We can do a GROUP BY on an expression; that expression doesn't have to appear in the SELECT list., for example:

SELECT t.* FROM main_table t WHERE t.column_name <= '2018-01-01' GROUP BY FLOOR(UNIX_TIMESTAMP(t.column_name)/((1440/2)*60))

(Note that this query will cause an error if ONLY_FULL_GROUP_BY is included in sql_mode; that disables a MySQL-specific extension which allows the query to run.)

Without some table definitions (including storage engine, column datatypes, indexes) and without EXPLAIN output, we're just guessing.

But some suggestions:

Drop the secondary indexes on the empty table being populated, and add them after the table is loaded.

I'd avoid the UNION. Given that one of the SELECT statements has a predicate on column_name and the other has a predicate on an entirely different column date, we do want to separate SELECT statements.

CREATE TABLE main_table_tmp LIKE main_table ; -- for performance, remove secondary indexes, leave just the cluster index ALTER TABLE main_table_tmp DROP INDEX noncluster_index_1 , DROP INDEX noncluster_index_2 , ... ; -- for performance, have a suitable index available on main_table -- with `column_name` as the leading column INSERT INTO main_table_tmp SELECT h.* FROM main_table h WHERE h.column_name <= '2018-01-01' GROUP BY FLOOR(UNIX_TIMESTAMP(h.column_name)/((1440/2)*60)) ; -- for performance, have a suitable index available on main_table -- with `date` as the leading column INSERT INTO main_table_tmp SELECT c.* FROM main_table WHERE c.date > '????-??-??' ; -- add secondary indexes ALTER TABLE maint_table_tmp ADD UNIQUE INDEX noncluster_index_1 (fee,fi,fo) , ADD INDEX noncluster_index_2 (fum) , ... ;
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

mysql数据同步到大数据后通过查询大数据实现报表的实时展示 big data java mysql 有问必答
2021-09-18 10:38

回答 4 已采纳你好像描述的有些问题吧？为什么查询大数据展示？你口中的大数据指定到底是什么？如果你是学习大数据（hadoop相关），使用java语言在做数仓项目，那么你可以参考一下这张图查询数据一般从数据库里面查
mysql大数据如何存储方便 java mysql
2019-12-28 10:42

回答 2 已采纳你修改过问题？500亿以上数据，根本不适合使用mysql这样的关系型数据库，应该用neo4j这种图数据库 https://baike.baidu.com/item/Neo4j/9952114?fr=
MySql的联合索引问题，求解 mysql
2019-05-16 11:22

回答 2 已采纳只有A用了这里就给你简单的解释一下mysql索引原理；还有就是你可以先了解一下Btree，这样更容易了解； 1.mysql在innodb的存储引擎下使用的是btree索引，包括主键的聚簇索引
调优攻略：10个提高MySQL性能的实用技巧
2023-10-23 20:00

Java程序员廖志伟的博客调优攻略：10个提高MySQL性能的实用技巧
Mysql插入大数据 mysql php
2013-09-28 04:38

回答 7 已采纳 Assuming that you are using InnoDB engine (which is default in most recent MySQL versions), you sh
mysql 俩表联合查询并排除指定数据 mysql
2021-01-10 19:24

回答 2 已采纳 select name from A where name !='6548' and name not in (select name from B where id ='1'); 或者： sel
mysql left join mysql
2022-05-09 17:32

回答 1 已采纳你的mysql版本如果是8.0之后的话可以采用开窗函数先处理b表select * ,row_number() over(partition by b.id) as b_num from b然后用b_
【大数据面试】MySQL面试题与答案
2023-12-20 17:36

话数Science的博客数据库中的事务是什么，MySQL中是怎么实现的 MySQL事务的特性? 数据库事务的隔离级别?解决了什么问题?默认事务隔离级别? 脏读，幻读，不可重复读的定义 MySQL怎么实现可重复读? 数据库第三范式和第四范式区别? ...
mysql修改数据没有反应是为什么 mysql sql 大数据
2023-04-02 16:21

回答 4 已采纳先用select查一波，确定这个编号有数据，再update
php插入mysql_query获取id html mysql php
2019-01-11 15:19

回答 1 已采纳 use mysql_insert_id();, you find the doc here
虚拟机中安装mysql服务器失败找不到路径 hadoop mysql 大数据
2022-06-07 16:24

回答 2 已采纳你是要从自己挂载的镜像里面来安装吗？如果是这样的话你需要在yum的配置文件里面添加一个挂载路径的本地源，这样才可以正常工作的。可以参考： CentOS软件管理 - YUM
MySQL 中优化 SQL 语句以提高查询性能
2024-09-24 14:31

完颜振江的博客在 MySQL 中优化 SQL 语句是提高查询性能的关键，尤其当数据量大或查询复杂时。
大量mysql数据查询，优化到秒出 java mysql 大数据
2022-12-07 17:18

回答 4 已采纳使用 instr函数试试看 ,查询 like '%121%' select * from test t where instr(t.requestdata,'121')> 0;
【MySQL】性能优化
2023-03-20 07:00

陈书予的博客这对于数据库查询以及在数据库中查找涉及的字段非常重要并可以提高查询速度。在进行数据库的设计和开发时，命名规范是非常重要的，可以有效地减少命名混乱、重名和不同的命名方式等问题，提高数据库的可维护性和易读...
mysql性能优化-SQL 查询优化
2024-09-21 15:27

Flying_Fish_Xuan的博客 MySQL 查询优化是数据库性能优化的重要环节。通过合理使用索引、优化查询语句和设计表结构，可以显著提高 MySQL 的查询性能。
没有解决我的问题, 去提问

悬赏问题

¥15 PointNet++的onnx模型只能使用一次
¥20 西南科技大学数字信号处理
¥15 有两个非常“自以为是”烦人的问题急期待大家解决！
¥30 STM32 INMP441无法读取数据
¥15 R语言绘制密度图，一个密度曲线内fill不同颜色如何实现
¥100 求汇川机器人IRCB300控制器和示教器同版本升级固件文件升级包
¥15 用visualstudio2022创建vue项目后无法启动
¥15 x趋于0时tanx-sinx极限可以拆开算吗
¥500 把面具戴到人脸上，请大家贡献智慧，别用大模型回答，大模型的答案没啥用
¥15 任意一个散点图自己下载其js脚本文件并做成独立的案例页面，不要作在线的，要离线状态。

提高涉及联合的MYSQL QUERY的性能

1条回答 默认 最新

悬赏问题

1条回答默认最新