I have a query, which using two JOINs, returns me a list in this format:
unique_id | non_unique_id | timestamp
The full list is big (thousands of rows), the result of the query is only a few dozens of rows, as the query has WHERE timestamp >= 'some timestamp in the past'
So now, I have the list like this:
89 | 286 | 1406219705
87 | 286 | 1406219518
79 | 922 | 1406216949
78 | 228 | 1406216871
77 | 126 | 1406216748
76 | 939 | 1406216722
74 | 126 | 1406216352
64 | 939 | 1406212540
63 | 126 | 1406212522
49 | 228 | 1406205715
48 | 228 | 1406204851
37 | 228 | 1406196435
32 | 228 | 1406190209
23 | 126 | 1406182577 <- 'limiting timestamp'
18 | 871 | 1406181991
10 | 922 | 1406178816
9 | 764 | 1406178778
7 | 609 | 1406178699
5 | 126 | 1406177398
4 | 871 | 1406177379 <- 'some timestamp in the past'
So now, I only need to select rows between the 'limiting timestamp' and the end of the list ('some timestamp in the past'). I could have specified the 'limiting timestamp' in the WHERE condition for the original query, but the problem is: I need the resulting set to have no records with non_unique_id, that have already appeared in the list above the 'limiting timestamp'. This is how the result of the query should look like:
<- 'limiting timestamp'
18 | 871 | 1406181991
9 | 764 | 1406178778
7 | 609 | 1406178699
<- 'some timestamp in the past'
So the result will return 3 rows, which all have non_unique_id that did not appear in the results above. But if the 'non_unique_id' had already appeared in the list between 'limiting timestamp' and the 'some timestamp in the past', then only the first occurrence should be kept. Note: the last part condition is optional, as it will be pretty easy to extract the duplicate from the final list.
So far I was only able to come up with the solution of doing a JOIN between the list >= 'some timestamp in the past' and > 'limiting timestamp'. This way I'll see if there are any occurrences of the top list in the bottom list. However, it can be assumed that the query is complex and time needed to produce it's results shouldn't be doubled by running it again, but with a slightly different condition.