MySQL，批量获取最后的活跃用户，但防止下一批次中的重复用户

I'm trying to extract users contributing to a specific topic in a message board.

Each request gets a batch of 10 Unique users.

The problem is that if some users where part of a previous batch they can occur in the next batch too.

SELECT p.post_id as id, p.author as uid, a.name 
FROM posts p 
INNER JOIN users a 
ON  a.id = p.author
AND p.topic_id = __TOPIC_ID__
AND p.post_id < __OFFSET_POST_ID__
GROUP BY p.author 
ORDER BY MAX(p.post_id) 
DESC LIMIT 10

My question is how I'm able to prevent those possible duplicates or at least get the lowest post_id.

Let's assume a single topic with 100 contributing users and 50000 posts written by them where only one of the first posts was made by the third user.

With a LIMIT of 10 it would be possible to get all 100 users in 10 queries. But this is not the way the above queries works:

If post 10000 up to 50000 were made by only ten users my ajax queries would get these users multiple times for many many requests. AND even worse...:

I could throw away all those requests because they would only contain duplicates every time.

What would be the "best" option to reduce the amount of queries?

One possible solution would be to query the n, 10 users but get the lowest post_id matching not as here the max() id. This way I could reduce the requests a bit in some cases but only in some cases.

Another way would be to use a:

AND p.author NOT IN( list of all uids queried before )

But This would make the problem even worse I guess...^^ Like:

SELECT * FROM X WHERE author_id NOT IN(1..to..4000000)...

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douhuijun3776 2015-02-16 18:28
关注
You're iterating over posts, not users, while you need to iterate over users. I think this might do the trick:

SELECT u.id, u.name, max(p.post_id) FROM users u INNER JOIN posts p ON p.author = u.id WHERE p.topic_id = :topic_id GROUP BY u.id ORDER BY max(p.post_id) DESC LIMIT 10 OFFSET :offset;

As you can see, I group over users.id (primary key), and not posts.author, which is not primary/unique key, but just foreign key to users. You get duplicates exactly because you group on posts.author
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

MySQL，批量获取最后的活跃用户，但防止下一批次中的重复用户 mysql php
2015-02-16 18:06

回答 1 已采纳 You're iterating over posts, not users, while you need to iterate over users. I think this might d
如何在mysql中获取特定id的最后一行？ [重复] mysql php
2015-10-19 12:28

回答 2 已采纳 You need to sort by id and limit resultset: SELECT * FROM userProducts WHERE user_id = 12 ORDER
mysql 遍历查询部门下的用户数量 mysql 数据库数据挖掘
2022-03-16 10:34

回答 1 已采纳请说明一下mysql的版本，8之前和8之后的写法不一样 --测试数据 create table test_20220315_c (id int,name VARCHAR(20),pid int);
基于用户标签的活跃人群特征分析_用户画像特征及标签存储
2020-12-24 23:20

weixin_39599097的博客 hive 存储：存储数据相关标签表、人群计算表的表结构设计以及ID-Mapping的一种实现方式建立用户画像首先需要建立数据仓库，用于存储用户标签数据。Hive是基于Hadoop的数据仓库工具，依赖于HDFS存储数据，提供的SQL...
mysql 获取当前用户的下二级团队订单信息 mysql
2019-04-13 14:33

回答 1 已采纳 select a.* from order a inner join from_user b on b.p_user_id in ('zs',select b.user_id from b wher
MySQL 中用LIMIT获取最后5条数据 mysql sql 数据库
2022-02-14 11:18

回答 2 已采纳你这个语法其实对于标准sql来说是个错的, 如果存在group by ,那么未聚合的字段必须都接在group by 后面,老版本的mysql没这个限制,但是查出来的数据是个随机的,比如你这个riqi,
如何用java实现对mysql数据库中多用户对一条数据的互斥访问 java java-ee mysql tomcat
2020-02-23 17:45

回答 1 已采纳如果这条数据不会有变动的话，其实不需要担心共享读会出现什么问题。如果这条数据在共享读的时候会有变动可能性的话，最常用的方式就是在java层面加上一把锁，如果是单机版的情况下加入synchroni
《大数据面试题》面试大数据这一篇就够了
2020-05-05 09:23

abluer~的博客《大数据面试题》面试大数据这一篇就够了 Hadoop 常见面试题 Hive 常见面试题 Spark 常见面试题 Flume 常见面试题 Kafka 常见面试题 Hbase 常见面试题 Redis 20 问
html表单下拉选项如何动态从mysql中获取 html mysql php
2022-07-07 13:02

回答 6 已采纳 1、修改 <!doctype html> <html> <head> <meta charset="utf-8" /> <titl
获取插入MySQL数据库的最后一条记录的id [重复] mysql php
2017-03-30 18:22

回答 4 已采纳 use an alias for get the value $sql = "SELECT MAX(id) as max_id FROM tbl_uploads"; $resul
如何在laravel中获取mysql表的最后一行值？ laravel mysql php
2017-12-25 18:37

回答 2 已采纳 If you want to get only the last ammount, use value(): Deposit::where('user_id', auth()->id())
史上最详细大数据基础知识
2023-03-11 17:25

djyjx的博客 大数据知识详解
mysql根据用户iD分组，并获取分组内的最新时间的一条记录 mysql
2017-10-24 07:46

回答 9 已采纳 select * from 表名 group by 用户ID having max(时间);
认识大数据
2022-09-20 14:15

三思而后行，慎承诺的博客 大数据基础，了解大数据
大数据开发面试知识点总结
2021-02-09 11:22

GoAI的博客本文详细介绍大数据hadoop生态圈各部分知识，包括不限于hdfs、yarn、mapreduce、hive、sqoop、kafka、flume、spark、flink等技术，总结内容适合大数据开发者学习，希望能够和大家多多交流。
没有解决我的问题, 去提问

悬赏问题

¥30 这是哪个作者做的宝宝起名网站
¥60 版本过低apk如何修改可以兼容新的安卓系统
¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
¥50 有数据，怎么建立模型求影响全要素生产率的因素
¥50 有数据，怎么用matlab求全要素生产率
¥15 TI的insta-spin例程
¥15 完成下列问题完成下列问题
¥15 C#算法问题, 不知道怎么处理这个数据的转换
¥15 YoloV5 第三方库的版本对照问题
¥15 请完成下列相关问题！

MySQL，批量获取最后的活跃用户，但防止下一批次中的重复用户

1条回答 默认 最新

悬赏问题

1条回答默认最新