duankuai6991 2010-02-24 06:11
浏览 16
已采纳

使用WHERE子句或使用应用程序代码过滤结果集是否更好?

OK, here is a simple abstraction of the problem:

2 variables(male_users and female_users) to store 2 groups of user i.e. male and female

  1. 1 way is to use two queries to select them :

select * from users where gender = 'male' and then store the result in male_users

select * from users where gender = 'female' and then store the result in female_users

  1. another way is to run only one query:

'select * from users' and then loop over the result set to filter the male users in the program php code snippet would be sth like this:

$result = mysql_query('select * from users');

while (($row=mysql_fetch_assoc(result)) != null) {
  if ($row['gender'] == 'male'){// add to male_users}
  else if ($row['gender'] == 'female'){// add to female_users}
}

which one is more efficient and considered as a better approach?

this is just a simple illustration of the problem. the real project may have lager tables to query and more filter options.

thanks in advance!

  • 写回答

3条回答 默认 最新

  • douting1871 2010-02-24 06:19
    关注

    The rule of thumb for any application is to let the DB do the things it does well: filtering, sorting, and joining.

    Separate the queries into their own functions or class methods:

    $men = $foo->fetchMaleUsers();
    $women = $foo->fetchFemaleUsers();
    

    Update

    I took Steven's PostgreSQL demonstration of a full table scan query performing twice as good as two separate indexed queries and mimicked it using MySQL (which is used in the actual question):

    Schema

    CREATE TABLE `gender_test` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `gender` enum('male','female') NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=26017396 DEFAULT CHARSET=utf8
    

    I changed the gender type to not be a VARCHAR(20) as it is more realistic for the purpose of this column, I also provide a primary key as you would expect on a table instead of an arbitrary DOUBLE value.

    Unindexed Results

    mysql> select sql_no_cache * from gender_test WHERE gender = 'male';
    
    12995993 rows in set (31.72 sec)
    
    mysql> select sql_no_cache * from gender_test WHERE gender = 'female';
    
    13004007 rows in set (31.52 sec)
    
    mysql> select sql_no_cache * from gender_test;
    
    26000000 rows in set (32.95 sec)
    

    I trust this needs no explanation.

    Indexed Results

    ALTER TABLE gender_test ADD INDEX (gender);
    

    ...

    mysql> select sql_no_cache * from gender_test WHERE gender = 'male';
    
    12995993 rows in set (15.97 sec)
    
    mysql> select sql_no_cache * from gender_test WHERE gender = 'female';
    
    13004007 rows in set (15.65 sec)
    
    mysql> select sql_no_cache * from gender_test;
    
    26000000 rows in set (27.80 sec)
    

    The results shown here are radically different from Steven's data. The indexed queries perform almost twice as fast as the full table scan. This is from a properly indexed table using common sense column definitions. I don't know PostgreSQL at all, but there must be some significant misconfiguration in Steven's example to not show similar results.

    Given PostgreSQL's reputation for doing things better than MySQL, or at least as good as, I daresay that PostgreSql would demonstrate similar performance if properly used.

    Also note, on this same machine an overly simplified for loop doing 52 million comparisons takes an additional 7.3 seconds to execute.

    <?php
    $N = 52000000;
    for($i = 0; $i < $N; $i++) {
        if (true == true) {
        }
    }
    

    I think it's rather obvious what is the better approach given this data.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场 部分对应不上
  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?