Currently I build a website, that will contain lot's of active and inactive contents. When a content is published, it is active and after some time it becomes inactive. I want to track all users who opend the contents in the active period. What is the best way to store these information in the database?
I need to query the database following questions: 1. Which users opened a specific content? This is queried rarly, maybe two or three times by an admin for each content. 2. Which active contents were opened by a specific user? 3. Which inactive contents (contents, which were active when they were opened and are inactive in the meantime) were opened by a specific user? Both queries are queried more often than the first one (maybe every two or three weeks). The user can check in his/her profile, which content he/she opened.
First I thought about a database table with three columns: "user id", "content id", "status". Where "status" is true for active contents and false for inactive contents. "User id" and "content id" are the primary key and "status" is a foreign key from the content table. But then I thought about the size of this table. In some years there (hopefully) might be 2 million nodes and 250k users, so that there could be 500 Billion rows... I think, that would decrease the performance heavily.
Therefore I thought about another way: I create two database tables, one is called "content_opened_by_users" and the other one is called "user_opened_content". The former will contain two collumns: "content id" and "users". Where "users" is an serialized array containing all the user ids, which opened the corresponding content id in the same row. The second table will contain three columns: "user id", "active content", "inactive content". Where "active content" and "inactive content" are also serialized arrays containing the content ids of active/inactive contents the corresponding user opened. So, whenever a content is opened by a user, the server loads the corresponding "users"-array from the "content_opened_by_users" table and the corresponding "active content"-array from the "user_opened_content" table. If the user id does not already exist in the "users"-array, it will be added and if the content id does not already exist in the "active content"-array, it will also be added. Then both arrays will be overwritten in the database. Now, if I query all users opened a certain content, I get an array from the "content_opened_by_users" table. And if I query all active/inactive nodes which were opened by a specific user, I get the "active content" and "inactive content" arrays from the "user_opened_content" table. Then I check, if the "active content" array contains some content ids, which are meanwhile inactive and transfer them to the "inactive content" array and pass them back to the database.
I know, that I create redundant data in this two tables, but I hope this will increase the performance.
Well, would this be a convient way to accomplish the user tracking? Or might there be another, more efficient way?
I would welcome every advice! Thank you very much. Daniel