Working on a tiny script that references data between users. Each user has a certain count for a "feature". For instance, feature A is (sad,happy,angry) and feature B is (sunny,clouded,thunder). The current row count of these features altogether is 200.000 for 1000 users. The algorithm is simple: for each users, count the features, calculate per feature which entry is the highest (A -> sad, B -> thunder) and make that a percentage of the total count of features per user. What I got now is a "rating" for a user, which I compare to all the users in the database (yes, by doing that all over again). Based on the assumption that highest feature + highest feature + highest feature etc. for all users, compared with all users, gives some sort of relative similarity between users.
The thing is, I'm doing this with PHP on a 4-core Xen instance from Linode. It's not very fast. Currently maxing out one core, the other for 30% and the rest sits idle. The script can be optimized, but I really want to figure out how to make the setup so that it goes faster. What kind of architecture do I need for this?
I can see that is a very broad question, but I hope someone can give me some pointers. Any help is greatly appreciated!
Kind regards,
Reinder