duan5362 2012-12-31 22:04
浏览 52

哪个更好,使用SQL查询进行数据操作或在php中操作数组中的数据? [重复]

Possible Duplicate:
When to use a query or code

I have a large database of users and their details. I want to find all the people that are most similar to each other. Which way is preferable for comparison ? Directly writing SQL queries such that they do manipulation within database to retrieve a filtered (matched) record. OR Retrieving the raw data first in an array in php & then applying comparison operations on that array ? Lets say I have 1000 users, and I want to find the people with most mutual friends.

UserA has {a,b,r,c,g,h,r,q,l}
UserB has {x,y,z}
UserC has {a,c,r,g,q}
.
.
.
UserN has{x,y....n}

So I want to find the users with the most mutual friends with that of UserA. Such as comparing each element of UserA to all the elements of all the user arrays from whole database ? In the above case UserC should be the most similar user to UserA. I don't know how to accomplish this.

  • 写回答

1条回答 默认 最新

  • duangeli1334 2012-12-31 23:22
    关注

    Suppose you represent friendships in a table like this:

    Table: friendships
    from_id  |  to_id
    ----------------------------
          a  |      a
          a  |      b
          a  |      r
          a  |      c
     ...etc...
          b  |      x
          b  |      y
          b  |      z
     ...etc...
    

    Now you can write a query to answer your question (tested in MySQL):

    SELECT user_id AS user_id_with_most_common_friends, MAX(cnt) AS number_of_common_friends FROM 
      (SELECT f2.from_id AS user_id, COUNT(*) AS cnt
       FROM friendships f1
       JOIN friendships f2 ON (f1.to_id = f2.to_id AND f2.from_id <> 'a')
       WHERE f1.from_id = 'a'
       GROUP BY f2.from_id) totals;
    

    As to the performance question, empirical testing in your specific situation will give the most reliable answer, but for a large database I would expect using the above SQL query to be much faster than querying every row and calculating the results in PHP, for these reasons:

    1. If from_id and to_id are indexed, the RDBMS may not need to touch every row
    2. Returning all records will ensure you touch every row and cause the php code to consume a lot of memory (which adds some overhead for memory allocations, and will slow things to a deadly crawl if you run out of RAM).
    3. Once your php code has the data, it will not likely be able to perform the individual steps of the computation any faster than SQL could.

    Unless your php code has something up its sleeve to compensate for these disadvantages (like an algorithm with lower complexity that cannot be expressed in SQL, or calling out to some specialized C code, e.g. image processing), it's typically going to be faster to keep the work inside the RDBMS as much as possible.

    评论

报告相同问题?

悬赏问题

  • ¥15 关于#c语言#的问题:我现在在做一个墨水屏设计,2.9英寸的小屏怎么换4.2英寸大屏
  • ¥15 模糊pid与pid仿真结果几乎一样
  • ¥15 java的GUI的运用
  • ¥15 Web.config连不上数据库
  • ¥15 我想付费需要AKM公司DSP开发资料及相关开发。
  • ¥15 怎么配置广告联盟瀑布流
  • ¥15 Rstudio 保存代码闪退
  • ¥20 win系统的PYQT程序生成的数据如何放入云服务器阿里云window版?
  • ¥50 invest生境质量模块
  • ¥15 nhanes加权logistic回归,svyglm函数