使用WHERE子句或使用应用程序代码过滤结果集是否更好？

OK, here is a simple abstraction of the problem:

2 variables(male_users and female_users) to store 2 groups of user i.e. male and female

1 way is to use two queries to select them :

select * from users where gender = 'male' and then store the result in male_users

select * from users where gender = 'female' and then store the result in female_users

another way is to run only one query:

'select * from users' and then loop over the result set to filter the male users in the program php code snippet would be sth like this:

$result = mysql_query('select * from users');

while (($row=mysql_fetch_assoc(result)) != null) {
  if ($row['gender'] == 'male'){// add to male_users}
  else if ($row['gender'] == 'female'){// add to female_users}
}

which one is more efficient and considered as a better approach?

this is just a simple illustration of the problem. the real project may have lager tables to query and more filter options.

thanks in advance!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

3条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douting1871 2010-02-24 06:19
关注
The rule of thumb for any application is to let the DB do the things it does well: filtering, sorting, and joining.

Separate the queries into their own functions or class methods:

$men = $foo->fetchMaleUsers(); $women = $foo->fetchFemaleUsers();

Update

I took Steven's PostgreSQL demonstration of a full table scan query performing twice as good as two separate indexed queries and mimicked it using MySQL (which is used in the actual question):

Schema

CREATE TABLE `gender_test` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `gender` enum('male','female') NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=26017396 DEFAULT CHARSET=utf8

I changed the gender type to not be a VARCHAR(20) as it is more realistic for the purpose of this column, I also provide a primary key as you would expect on a table instead of an arbitrary DOUBLE value.

Unindexed Results

mysql> select sql_no_cache * from gender_test WHERE gender = 'male'; 12995993 rows in set (31.72 sec) mysql> select sql_no_cache * from gender_test WHERE gender = 'female'; 13004007 rows in set (31.52 sec) mysql> select sql_no_cache * from gender_test; 26000000 rows in set (32.95 sec)

I trust this needs no explanation.

Indexed Results

ALTER TABLE gender_test ADD INDEX (gender);

...

mysql> select sql_no_cache * from gender_test WHERE gender = 'male'; 12995993 rows in set (15.97 sec) mysql> select sql_no_cache * from gender_test WHERE gender = 'female'; 13004007 rows in set (15.65 sec) mysql> select sql_no_cache * from gender_test; 26000000 rows in set (27.80 sec)

The results shown here are radically different from Steven's data. The indexed queries perform almost twice as fast as the full table scan. This is from a properly indexed table using common sense column definitions. I don't know PostgreSQL at all, but there must be some significant misconfiguration in Steven's example to not show similar results.

Given PostgreSQL's reputation for doing things better than MySQL, or at least as good as, I daresay that PostgreSql would demonstrate similar performance if properly used.

Also note, on this same machine an overly simplified for loop doing 52 million comparisons takes an additional 7.3 seconds to execute.

<?php $N = 52000000; for($i = 0; $i < $N; $i++) { if (true == true) { } }

I think it's rather obvious what is the better approach given this data.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(2条)

报告相同问题？

关注问题

使用WHERE子句或使用应用程序代码过滤结果集是否更好？ mysql php sql
2010-02-24 06:11

回答 3 已采纳 The rule of thumb for any application is to let the DB do the things it does well: filtering, sort
如何在WHERE子句上使用多个OR？ mysql php
2019-04-26 13:26

回答 1 已采纳 ORDER BY and LIMIT should be goes at the end. You can use the multiple conditions in OR as fo
SQL multiple where子句使用PHP进行过滤 mysql php sql
2017-10-13 20:23

回答 1 已采纳 You should be using AND, not OR. But a better way is to build the IN() list dynamically. $event_
雅虎yql php_使用YQL和PHP构建Web应用程序，第1部分
2020-07-04 13:06

cuxiong8996的博客 2010年11月30日- 在“ 简介和结论”部分中添加了指向第1部分... 如今，几乎每个流行的网站都有一个开发人员API，允许Web应用程序程序员使用诸如REST和SOAP之类的标准框架来访问和操纵数据。例如，Google提供了Goog...
使用多个where子句选择多个查询 mysql php
2019-06-10 18:08

回答 5 已采纳 When making a join query you can specify what column to join on. In this case it would be: select
何时使用Where子句和ON子句加入mysql查询？ mysql php
2017-09-26 09:42

回答 3 已采纳 For that I think you need to move AND u.deleted IS NULL into a WHERE clause. Currently that A
求助：在WHERE中使用OR子句或在eloquent的laravel中使用HAVING laravel php
2018-03-13 05:58

回答 3 已采纳 The equivalent in eloquent of parenthesis would be using a closure as the parameter of where or or
coldfusion_使用ColdFusion组件创建可扩展的应用程序
2020-08-03 11:58

culi4814的博客除了一次性应用程序之外，CFC还允许使用更好，更可靠和更具可伸缩性的应用程序。 These advantages arise from separating presentation logic from the business rules and data access logic. Code that ...
如何在jackc / pgx中使用“ where id in”子句？ database postgresql
2019-05-10 09:16

回答 1 已采纳 As you already know IN expects a list of scalar expressions, not an array, however pgtype.Int4Arra
使用where子句获取不同的行及其记录计数 database mysql php sql
2018-03-07 22:33

回答 3 已采纳 Use SUM() to count the number of rows that match the desired condition. SELECT class, SUM(class =
如何在mysql的where子句中使用select查询？ mysql php
2015-12-10 10:33

回答 4 已采纳 As Saty already mentioned in a comment: you probably mean: SELECT * FROM selected_items GROUP BY
大数据技术原理与应用
2022-10-17 22:39

m0_54000719的博客调大namenode内存或将文件系统元数据存到硬盘里我的答案：C正确答案：D HDFS的特性通过调大namenode内存或将文件系统元数据存到硬盘里并不能使HDF处理好小文件。如何启动ssh服务？（） A. service ssh status B....
sql中的where子句字段=？是什么意思 mysql sql
2022-02-21 10:34

回答 3 已采纳？用来给参数占位，你可以把？当作一个变量，需要提前赋值给出。具体你可以搜索数据库where语句中的问号
《大数据面试题》面试大数据这一篇就够了
2020-05-05 09:23

abluer~的博客《大数据面试题》面试大数据这一篇就够了 Hadoop 常见面试题 Hive 常见面试题 Spark 常见面试题 Flume 常见面试题 Kafka 常见面试题 Hbase 常见面试题 Redis 20 问
log4jphp_使用Neo4jPHP应用程序中的有效用户时间表
2020-08-27 23:31

culi3118的博客如今您遇到的任何社交应用程序都具有时间轴，通常按时间降序显示您的朋友或关注者的状态。对于常见SQL或NoSQL数据库，实现这种功能从未如此简单。 Complexity of queries, performance impacts increasing with ...
没有解决我的问题, 去提问

悬赏问题

¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场部分对应不上
¥15 如何在scanpy上做差异基因和通路富集？
¥20 关于#硬件工程#的问题，请各位专家解答！
¥15 关于#matlab#的问题：期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707，使系统具有较小的超调量
¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
¥30 截图中的mathematics程序转换成matlab
¥15 动力学代码报错，维度不匹配
¥15 Power query添加列问题
¥50 Kubernetes&Fission&Eleasticsearch
¥15 報錯：Person is not mapped，如何解決？

使用WHERE子句或使用应用程序代码过滤结果集是否更好？

3条回答 默认 最新

Update

Schema

Unindexed Results

Indexed Results

悬赏问题

3条回答默认最新