dte49889 2014-11-02 16:28
浏览 17
已采纳

数据库结构的最佳实践,以跟踪用户是否打开了某个内容?

Currently I build a website, that will contain lot's of active and inactive contents. When a content is published, it is active and after some time it becomes inactive. I want to track all users who opend the contents in the active period. What is the best way to store these information in the database?

I need to query the database following questions: 1. Which users opened a specific content? This is queried rarly, maybe two or three times by an admin for each content. 2. Which active contents were opened by a specific user? 3. Which inactive contents (contents, which were active when they were opened and are inactive in the meantime) were opened by a specific user? Both queries are queried more often than the first one (maybe every two or three weeks). The user can check in his/her profile, which content he/she opened.

First I thought about a database table with three columns: "user id", "content id", "status". Where "status" is true for active contents and false for inactive contents. "User id" and "content id" are the primary key and "status" is a foreign key from the content table. But then I thought about the size of this table. In some years there (hopefully) might be 2 million nodes and 250k users, so that there could be 500 Billion rows... I think, that would decrease the performance heavily.

Therefore I thought about another way: I create two database tables, one is called "content_opened_by_users" and the other one is called "user_opened_content". The former will contain two collumns: "content id" and "users". Where "users" is an serialized array containing all the user ids, which opened the corresponding content id in the same row. The second table will contain three columns: "user id", "active content", "inactive content". Where "active content" and "inactive content" are also serialized arrays containing the content ids of active/inactive contents the corresponding user opened. So, whenever a content is opened by a user, the server loads the corresponding "users"-array from the "content_opened_by_users" table and the corresponding "active content"-array from the "user_opened_content" table. If the user id does not already exist in the "users"-array, it will be added and if the content id does not already exist in the "active content"-array, it will also be added. Then both arrays will be overwritten in the database. Now, if I query all users opened a certain content, I get an array from the "content_opened_by_users" table. And if I query all active/inactive nodes which were opened by a specific user, I get the "active content" and "inactive content" arrays from the "user_opened_content" table. Then I check, if the "active content" array contains some content ids, which are meanwhile inactive and transfer them to the "inactive content" array and pass them back to the database.

I know, that I create redundant data in this two tables, but I hope this will increase the performance.

Well, would this be a convient way to accomplish the user tracking? Or might there be another, more efficient way?

I would welcome every advice! Thank you very much. Daniel

  • 写回答

1条回答 默认 最新

  • dora12345678 2014-11-02 16:49
    关注

    First, you might want to read about database normalization.

    In order to track the information you described, I would use separate tables for users, content, and access like this:

    table    | columns
    -------------------
    users    | id, login, ...
    content  | id, title, active, ...
    access   | id, user_id, content_id, timestamp, ...
    

    You can then store users and content items irrespective of who opened which item. The access table contains entries about which user opened which content item at which time. This allows you to track numerous statistics without having to store any information about a user not opening an item.

    The main queries would be:

    1. Which users opened an item

      SELECT DISTINCT users.login FROM access JOIN users ON access.user_id = users.id JOIN content ON access.content_id = content.id WHERE content.title LIKE '%test%'
      
    2. Active content opened by user

      SELECT DISTINCT content.title FROM access JOIN users ON access.user_id = users.id JOIN content ON access.content_id = content.id WHERE content.active = true AND user.login = 'testuser'
      
    3. Inactive content opened by a user

      SELECT DISTINCT content.title FROM access JOIN users ON access.user_id = users.id JOIN content ON access.content_id = content.id WHERE content.active = false AND user.login = 'testuser'
      

      Assuming that access can only happen to active content. To track the time of a status change, add another table which you can use to look up the timestamp of the last change and compare to the access entries.

    Note, that these queries are not optimized at all. You should also add corresponding FOREIGN KEY constraints to ensure data consistency and consider adding additional indexes to increase performance.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
  • ¥20 cad图纸,chx-3六轴码垛机器人
  • ¥15 移动摄像头专网需要解vlan
  • ¥20 access多表提取相同字段数据并合并
  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算
  • ¥15 powerbuilder中的datawindow数据整合到新的DataWindow
  • ¥20 有人知道这种图怎么画吗?
  • ¥15 pyqt6如何引用qrc文件加载里面的的资源
  • ¥15 安卓JNI项目使用lua上的问题