duan5362 2012-12-31 22:04
浏览 52

哪个更好,使用SQL查询进行数据操作或在php中操作数组中的数据? [重复]

Possible Duplicate:
When to use a query or code

I have a large database of users and their details. I want to find all the people that are most similar to each other. Which way is preferable for comparison ? Directly writing SQL queries such that they do manipulation within database to retrieve a filtered (matched) record. OR Retrieving the raw data first in an array in php & then applying comparison operations on that array ? Lets say I have 1000 users, and I want to find the people with most mutual friends.

UserA has {a,b,r,c,g,h,r,q,l}
UserB has {x,y,z}
UserC has {a,c,r,g,q}
.
.
.
UserN has{x,y....n}

So I want to find the users with the most mutual friends with that of UserA. Such as comparing each element of UserA to all the elements of all the user arrays from whole database ? In the above case UserC should be the most similar user to UserA. I don't know how to accomplish this.

  • 写回答

1条回答 默认 最新

  • duangeli1334 2012-12-31 23:22
    关注

    Suppose you represent friendships in a table like this:

    Table: friendships
    from_id  |  to_id
    ----------------------------
          a  |      a
          a  |      b
          a  |      r
          a  |      c
     ...etc...
          b  |      x
          b  |      y
          b  |      z
     ...etc...
    

    Now you can write a query to answer your question (tested in MySQL):

    SELECT user_id AS user_id_with_most_common_friends, MAX(cnt) AS number_of_common_friends FROM 
      (SELECT f2.from_id AS user_id, COUNT(*) AS cnt
       FROM friendships f1
       JOIN friendships f2 ON (f1.to_id = f2.to_id AND f2.from_id <> 'a')
       WHERE f1.from_id = 'a'
       GROUP BY f2.from_id) totals;
    

    As to the performance question, empirical testing in your specific situation will give the most reliable answer, but for a large database I would expect using the above SQL query to be much faster than querying every row and calculating the results in PHP, for these reasons:

    1. If from_id and to_id are indexed, the RDBMS may not need to touch every row
    2. Returning all records will ensure you touch every row and cause the php code to consume a lot of memory (which adds some overhead for memory allocations, and will slow things to a deadly crawl if you run out of RAM).
    3. Once your php code has the data, it will not likely be able to perform the individual steps of the computation any faster than SQL could.

    Unless your php code has something up its sleeve to compensate for these disadvantages (like an algorithm with lower complexity that cannot be expressed in SQL, or calling out to some specialized C code, e.g. image processing), it's typically going to be faster to keep the work inside the RDBMS as much as possible.

    评论

报告相同问题?

悬赏问题

  • ¥15 python的qt5界面
  • ¥15 无线电能传输系统MATLAB仿真问题
  • ¥50 如何用脚本实现输入法的热键设置
  • ¥20 我想使用一些网络协议或者部分协议也行,主要想实现类似于traceroute的一定步长内的路由拓扑功能
  • ¥30 深度学习,前后端连接
  • ¥15 孟德尔随机化结果不一致
  • ¥15 apm2.8飞控罗盘bad health,加速度计校准失败
  • ¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
  • ¥15 谁有desed数据集呀
  • ¥20 手写数字识别运行c仿真时,程序报错错误代码sim211-100