dongxixian7803 2014-07-24 17:08
浏览 66
已采纳

选择列表中值的第一个外观(DISTINCT / GROUP BY)

I have a query, which using two JOINs, returns me a list in this format:

unique_id | non_unique_id | timestamp

The full list is big (thousands of rows), the result of the query is only a few dozens of rows, as the query has WHERE timestamp >= 'some timestamp in the past'

So now, I have the list like this:

89 | 286 | 1406219705
87 | 286 | 1406219518
79 | 922 | 1406216949
78 | 228 | 1406216871
77 | 126 | 1406216748
76 | 939 | 1406216722
74 | 126 | 1406216352
64 | 939 | 1406212540
63 | 126 | 1406212522
49 | 228 | 1406205715
48 | 228 | 1406204851
37 | 228 | 1406196435
32 | 228 | 1406190209
23 | 126 | 1406182577  <- 'limiting timestamp'
18 | 871 | 1406181991
10 | 922 | 1406178816
 9 | 764 | 1406178778
 7 | 609 | 1406178699
 5 | 126 | 1406177398
 4 | 871 | 1406177379  <- 'some timestamp in the past'

So now, I only need to select rows between the 'limiting timestamp' and the end of the list ('some timestamp in the past'). I could have specified the 'limiting timestamp' in the WHERE condition for the original query, but the problem is: I need the resulting set to have no records with non_unique_id, that have already appeared in the list above the 'limiting timestamp'. This is how the result of the query should look like:

                       <- 'limiting timestamp'
18 | 871 | 1406181991

 9 | 764 | 1406178778
 7 | 609 | 1406178699

                       <- 'some timestamp in the past'

So the result will return 3 rows, which all have non_unique_id that did not appear in the results above. But if the 'non_unique_id' had already appeared in the list between 'limiting timestamp' and the 'some timestamp in the past', then only the first occurrence should be kept. Note: the last part condition is optional, as it will be pretty easy to extract the duplicate from the final list.

So far I was only able to come up with the solution of doing a JOIN between the list >= 'some timestamp in the past' and > 'limiting timestamp'. This way I'll see if there are any occurrences of the top list in the bottom list. However, it can be assumed that the query is complex and time needed to produce it's results shouldn't be doubled by running it again, but with a slightly different condition.

  • 写回答

2条回答 默认 最新

  • doumu2172 2014-07-24 18:31
    关注

    You can try this if SQL Server 2008+ (sqlfiddle: http://sqlfiddle.com/#!3/0bc33/3):

    WITH cteOrdered
    AS
    (
        SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp) AS RID,
                  t1.*
        FROM      Table1 t1 LEFT JOIN
                  (SELECT  Non_Unique_ID
                   FROM    Table1
                   WHERE   Timestamp < 1406177379 OR
                           Timestamp > 1406182577) t2
                  ON t1.Non_Unique_ID = t2.Non_Unique_ID
        WHERE     t2.Non_Unique_ID IS NULL AND
                  t1.Timestamp > 1406177379 AND 
                  t1.Timestamp < 1406182577
     )
    
    SELECT    Unique_ID,
              Non_Unique_ID,
              Timestamp
    FROM      cteOrdered
    WHERE     RID = 1;
    

    I've added another row into the data

    (18, 871, 1406181990),
    

    to see if the query was producing you want. You said if there are duplicate non_unique_id's within the search range only the "first" occurrence should be kept. I take this is the one with the EARLIEST timestamp? If opposite, you can change this line

    SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp) AS RID,
    

    to

    SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp DESC) AS RID,
    

    and that will flip the order to retain the LATEST timestamp for the duplicates.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿
  • ¥15 回答4f系统的像差计算
  • ¥15 java如何提取出pdf里的文字?