I have a conceptual programming/database (PHP/MYSQL) question for a project I'm working on. Just wondering if anyone can help with a solution. I've been racking my brains for ages trying to work out the best way to solve this issue and am pretty sure I've looked everywhere for an answer on SO.
I can't be particularly verbose about the specifics of what the site does due to the concept being the property of someone else. But I'll try to explain what I can...
I have a project that will involve a large amount of data being quickly queried. We're talking a minimum of 2 Million Records...This in itself isn't a massive issue, but the way I need to query and interact with this data is.
Take the following as an example work flow for this site.
1) User Visits Website
2) User Logs In
3) User Clicks on the 'Random' Link - A bit like Google's 'I'm feeling Lucky'
4) User is randomly displayed an item from the database that he/she hasn't seen before from the 2Mil+ List of items in the db. This should have the ability to filter by item location/type.
5) User gives 'feedback' on the item displayed, then clicks next
6) the next random item in the table is displayed and the process repeats
The storage of 2 million records isn't the problem. The real problem is, how do I hold the feedback data so it is then not included in Generate Link system from that point forth for that specific User.
That is, how would I do this in such a way that the user doesn't have to wait a long time between items for the query to complete. I understand that Splits and Joins could be used, but this still doesn't seem very fast when I've tried it.
To be 100% Clear as an example, this is the closest I've come to a solution before grinding to a halt.
Say for instance we have five tables:
item_list
- ItemId (int, inc, PK)
- ItemName (varchar)
- ItemCity (int, rel)
- ItemCountry (int, rel)
item_city
- LocationId (int, inc, PK)
- LocationName (varchar)
item_country
- LocationId (int, inc, PK)
- LocationName (varchar)
user_list
- UserId (int, inc, PK)
- UserName (varchar)
user_input
- InputId (int, inc, PK)
- UserId (int, rel)
- ItemId (int, rel)
- ItemFeedback (text)
When a user gives feedback on an item within item_list, a record is added to the user_input table with the relevant UserId and ItemId. I could then do a select on item_list where ItemId is not in the user_input table against their UserId. This strikes me as a bad way to go though. If you have 2 million items, that's 1 user with 2 mil records in the user_input table if he completes every item...However, I expect at least 50 users which equates to 100 million records if every user gives feedback on every item. Same goes for a comma separated list of values.
I just have no idea how to achieve this without massive slowdowns or lag as soon as you reach 1000+ pieces of feedback from a user.
I really hope this makes sense let me know if you have any questions and I will try my hardest to answer them.
Thanks -Dave