I'm trying to extract users contributing to a specific topic in a message board.
Each request gets a batch of 10 Unique users.
The problem is that if some users where part of a previous batch they can occur in the next batch too.
SELECT p.post_id as id, p.author as uid, a.name
FROM posts p
INNER JOIN users a
ON a.id = p.author
AND p.topic_id = __TOPIC_ID__
AND p.post_id < __OFFSET_POST_ID__
GROUP BY p.author
ORDER BY MAX(p.post_id)
DESC LIMIT 10
My question is how I'm able to prevent those possible duplicates or at least get the lowest post_id.
Let's assume a single topic with 100 contributing users and 50000 posts written by them where only one of the first posts was made by the third user.
With a LIMIT of 10 it would be possible to get all 100 users in 10 queries. But this is not the way the above queries works:
If post 10000 up to 50000 were made by only ten users my ajax queries would get these users multiple times for many many requests. AND even worse...:
I could throw away all those requests because they would only contain duplicates every time.
What would be the "best" option to reduce the amount of queries?
One possible solution would be to query the n, 10 users but get the lowest post_id matching not as here the max() id. This way I could reduce the requests a bit in some cases but only in some cases.
Another way would be to use a:
AND p.author NOT IN( list of all uids queried before )
But This would make the problem even worse I guess...^^ Like:
SELECT * FROM X WHERE author_id NOT IN(1..to..4000000)...