dousi4148 2014-01-17 13:05
浏览 57

通过MySQL获取最不随机的记录集

I am trying to find a proper way of getting X random records. Also I want to check usage of a specific record so I won't use the same random record as often as others.

I am testing the set with these 3 tables, one table for questions, one table for users, and one table for the served question for a specific user. I want to make this perform with around 6000 questions.

CREATE TABLE `questions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `question` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `served` (
  `user` int(11) NOT NULL DEFAULT '0',
  `question` int(11) NOT NULL DEFAULT '0',
  `count` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`user`,`question`),
  KEY `count` (`count`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `nickname` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The query I found to work very well with getting random records from the question table is as following:

SELECT  id, question
        FROM    (
                SELECT  @cnt := COUNT(*) + 1,
                        @lim := 10
                FROM    questions
                ) vars
        STRAIGHT_JOIN
                (
                SELECT  q.*,
                        @lim := @lim - 1
                FROM    questions q 
                WHERE   (@cnt := @cnt - 1)
                        AND RAND() < @lim / @cnt
                ) i

But now I would like to incorporate the served table to make sure the random values being picked from the questions that were served the least. The query I thought of was as following:

SELECT  id, question, count
        FROM    (
                SELECT  @cnt := COUNT(*) + 1,
                        @lim := 10
                FROM    questions
                ) vars
        STRAIGHT_JOIN
                (
                SELECT  q.*,
                        s.count,
                        @lim := @lim - 1
                FROM    questions q
                LEFT JOIN served s
                ON s.question = q.id
                WHERE   (@cnt := @cnt - 1)
                        AND RAND() < @lim / @cnt
                ORDER BY count ASC) i

The problem with this query is that it never gives my limit of 10 results + it never gives the records I would want. Could anyone push me in the right direction?

As requested a SQL Fiddle with some data to test with: http://sqlfiddle.com/#!2/3e5ed/5. I would expect the results to be 10 questions where the "count" of served for user 1 is the least (or not existing offcourse).

I have ended up using a modified query, it had to be quick:

SELECT q.*, s1.count AS count_a, s2.count AS count_b
FROM questions q
LEFT JOIN served s1
ON (s1.question = q.id AND s1.user = 1)
LEFT JOIN served s2
ON (s2.question = q.id AND s2.user = 2)
WHERE q.categorie = 1
ORDER BY IFNULL(s1.count, 0) + IFNULL(s2.count, 0) + RAND()
LIMIT 10
  • 写回答

1条回答 默认 最新

  • dongxiaoguang9108 2014-01-17 13:34
    关注

    A common way that people get random records in MySQL is like this:

    To get 10 random records:

    SELECT * FROM questions
    ORDER BY RAND()
    LIMIT 10
    

    Of course, as may be obvious, this gets all the records in the database, then sorts them randomly in order to get 10 records. It doesn't actually just choose 10 random records from the database. However, this method does easily prevent duplicates.

    Now, using the same technique, if you wanted to favor less served questions, you could do something like this:

    SELECT questions.* FROM questions
    LEFT JOIN served
    ON served.question = questions.id
    ORDER BY IFNULL(served.count, 0) + RAND()
    LIMIT 10
    

    Tweak the algorithm to alter the amount you favor the serve count.

    There are more performant ways to fetch random records, such as getting the maximum primary key value (assuming auto_increment) then use RAND() on that, then pick just one record. You can use LIMIT 1 just in case RAND() returns a gap in your keys. However, then you could have duplicates if you repeat this process to return more than one record.

    If you have contiguous auto_increment values, you could easily leverage PHP to choose a random set of keys then fetch each record individually. If they're not contiguous, you first fetch a list of keys.

    These techniques are covered in more detail in Chapter 16 Random Selection, in the book SQL Antipatterns.

    评论

报告相同问题?

悬赏问题

  • ¥15 BP神经网络控制倒立摆
  • ¥20 要这个数学建模编程的代码 并且能完整允许出来结果 完整的过程和数据的结果
  • ¥15 html5+css和javascript有人可以帮吗?图片要怎么插入代码里面啊
  • ¥30 Unity接入微信SDK 无法开启摄像头
  • ¥20 有偿 写代码 要用特定的软件anaconda 里的jvpyter 用python3写
  • ¥20 cad图纸,chx-3六轴码垛机器人
  • ¥15 移动摄像头专网需要解vlan
  • ¥20 access多表提取相同字段数据并合并
  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算