如何优化选择300.000行+的SQl代码

I need to make XML files for a table that contains 300k+ records.

The code takes around 3~4s to finish (Is this acceptable?).

Add to that the data retrieval from MySQL, which takes around ~32s to finish (Is this acceptable?):

Query

SELECT `id`, `join_at` 
FROM girls g 
WHERE g.del_flg = 0 
ORDER BY g.join_at, g.id ASC

If I run this single query from the navicat mysql side it still takes around ~20s.

What I tried:

At first, the select query did not work because of a "memory exhausted" error (php.ini - memory_limit = 128M)
After that I changed memory_limit to -1. But I see that many people tell it's bad to change memory_limit into -1

So how to optimize the select query for 300k+ records in case of:

using PHP, sql, DOMDocument code only
Use options from #1 combined with an indexed column in the database
anything else that you know ...

PHP code with SQL query:

public function getInfo() {
    MySQL::connect();
    try {
        $select = 'SELECT `id`, `join_at`';
        $sql = ' FROM girls g';
        $sql .= ' WHERE g.del_flg = 0';
        $sql .= ' ORDER BY g.join_at, g.id ASC';
        $sql = sprintf($sql, $this->table);
        MySQL::$sth = MySQL::$pdo->prepare($select . $sql);
        MySQL::$sth->execute();
        while($rows = MySQL::$sth->fetch(\PDO::FETCH_ASSOC)) {
            $values[] = array('id' => $rows['id'], 'join_at' => $rows['join_at']);
        }
        // $rows = MySQL::$sth->fetchAll(\PDO::FETCH_ASSOC);
    } catch (\PDOException $e) {
        return null;
    }
    return $values;
}

I found out that ORDER BY g.join_at, g.id ASC part impacts the execution time. When I remove it, and use PHP instead for sorting, the execution time decreases from ~50s total to ~5s.

One more thing is that if I set memory_limit to 128M it leads to a "memory exhausted" error (512M will work). Is there any other solution for this problem?

Here are the indexes I currently have on the table:

Index

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duankuangxie9070 2019-02-02 10:54
关注
Sorting is better done on the database side. But you need to define the proper indexes. The order by is expensive in your case. Although you have an index on del_flg which could prevent a table scan, it cannot be benefited from to produce the desired output order.

So I would suggest to alter the index del_flg you have to include more fields:

DROP INDEX del_flg ON girls; CREATE INDEX del_flg ON girls (del_flg, join_at, id);

If that does not improve the execution time, then create an index that does not include del_flag:

CREATE INDEX join_at ON girls (join_at, id);
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

javascript中递归实现1+2+3+4+...+100怎么实现，js递归输出结果可以用log javascript
2018-05-27 13:08

回答 18 已采纳 ``` function recursion(i){ if(i==1)return i; return i+recursion(i-1) } alert(recursi
用c++语言编写s=1+(1+2)+(1+2+3)+````+(1+2+3+...+10) c++
2018-12-16 04:39

回答 2 已采纳 # 如果问题得到解决，请点我回答左上角的采纳和向上的箭头，谢谢。你第一个for多了一个前花括号，导致最后括号没有和int main()后面的那个配对。目测int j=1;这里的分号，也写错了，
使用keras进行分类问题时，验证集loss,accuracy 显示0.0000e+00，但是最后画图像时能显示出验证曲线 keras 深度学习
2018-12-10 06:32

回答 4 已采纳没看到你history输出的代码，感觉曲线图是可信的，而你输出val_loss的时候输出错了。
educoder平台+大数据从入门到实战+14个模块习题
2021-03-30 16:17

宗哲的博客配置配置的题型需要根据自己的实际情况来在平台上完成，下面配置的题型的代码，仅做参考。（配置的题型争取在网络环境好的情况下，一次通过，不要间断，否则会比较麻烦） 大数据从入门到实战第1关：配置开发...
计算1+2+3+...+100，使用递归算法实现。
2015-12-12 06:42

回答 6 已采纳 ``` #include int foo(int acc, int n) { if (n > 100) return acc; return foo(acc
c#编程计算2-4+6-8.......+98-100 c#
2017-03-24 10:20

回答 1 已采纳 int compute() { int result = 0; for(int i = 2; i <=100; i+=2) { result += Ma
用c++语言编写s=1+(1+2)+(1+2+3)+````+(1+2+3+...+n) 程序错了 c++
2016-03-19 16:15

回答 2 已采纳错误的地方: 1、首先include后面没有包含 2、sum 值没有初始化； 3、sum是局部变量，for语句执行完成就释放了，应该定位为static变量 4、for循环中应添加相等的情况，也
【大数据实战项目七】数据探索（航空公司与飞机数据统计与补充）
2021-11-13 14:43

lys_828的博客 StringType(), True), # "CRSArrTime":"2015-12-31T03:20:00.000-08:00" StructField("Alias", StringType(), True), # "CRSDepTime":"2015-12-31T03:05:00.000-08:00" StructField("IATA", StringType(), True), #...
请问js里面怎么计算1+2+3...+100 javascript
2018-06-09 05:20

回答 5 已采纳 ``` function recursion(i){ if(i==1)return i; return i+recursion(i-1) } alert(recur
【急】easyPoi使用ExcelExportUtil.exportBigExcel大数据导出报错 java
2021-08-06 17:20

回答 1 已采纳已解决，poi版本冲突问题
DBcontext.database.SqlQuery(sql).ToListAsync()的异步原理 c#
2019-02-28 16:43

回答 3 已采纳根据实际使用情况，查询是在tolist方法被调用后才执行的，因此查询过程应该也是异步执行的。
【spark床头书系列】Spark SQL示例用法所有函数示例权威详解一【建议收藏】
2023-11-26 00:13

BigDataMLApplication的博客 Spark SQL示例用法所有函数示例权威详解
Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState' eclipse intellij-idea java java-ee scala
2019-08-09 21:16

回答 3 已采纳这个问题我已经解决，我发现是仔细看了一下，后面的错误是Exception in thread "main"java.lang.UnsatisfiedLinkError，我本地的hadoop版本是2.7
大数据常见面试问题汇总
2023-07-12 12:50

Bigdata_shit的博客 1.5.27 磁盘选择 1.5.28 内存选择 1.5.29 CPU选择 1.5.30 网络选择 1.5.31 Kafka挂掉 1.5.32 Kafka的机器数量 1.5.33 服役新节点退役旧节点 1.5.34 Kafka单条日志传输大小 1.5.35 Kafka参数优化 1.6 Hive 1.6.1 ...
大数据FLINK实时数仓项目实战
2022-10-06 08:26

wespten的博客 } 注意：每行配置完毕后有分号。将日志采集的jar包同步到hadoop203和hadoop204： [yyds@hadoop202 module]$ xsync rt_gmall/ 修改模拟日志生成的配置：发送到的服务器路径修改为nginx的。 [yyds@hadoop202 rt_app...
没有解决我的问题, 去提问

悬赏问题

¥15 如何在scanpy上做差异基因和通路富集？
¥20 关于#硬件工程#的问题，请各位专家解答！
¥15 关于#matlab#的问题：期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707，使系统具有较小的超调量
¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
¥30 截图中的mathematics程序转换成matlab
¥15 动力学代码报错，维度不匹配
¥15 Power query添加列问题
¥50 Kubernetes&Fission&Eleasticsearch
¥15 報錯：Person is not mapped，如何解決？
¥15 c++头文件不能识别CDialog