dongxixian7803 2014-07-24 17:08
浏览 66
已采纳

选择列表中值的第一个外观(DISTINCT / GROUP BY)

I have a query, which using two JOINs, returns me a list in this format:

unique_id | non_unique_id | timestamp

The full list is big (thousands of rows), the result of the query is only a few dozens of rows, as the query has WHERE timestamp >= 'some timestamp in the past'

So now, I have the list like this:

89 | 286 | 1406219705
87 | 286 | 1406219518
79 | 922 | 1406216949
78 | 228 | 1406216871
77 | 126 | 1406216748
76 | 939 | 1406216722
74 | 126 | 1406216352
64 | 939 | 1406212540
63 | 126 | 1406212522
49 | 228 | 1406205715
48 | 228 | 1406204851
37 | 228 | 1406196435
32 | 228 | 1406190209
23 | 126 | 1406182577  <- 'limiting timestamp'
18 | 871 | 1406181991
10 | 922 | 1406178816
 9 | 764 | 1406178778
 7 | 609 | 1406178699
 5 | 126 | 1406177398
 4 | 871 | 1406177379  <- 'some timestamp in the past'

So now, I only need to select rows between the 'limiting timestamp' and the end of the list ('some timestamp in the past'). I could have specified the 'limiting timestamp' in the WHERE condition for the original query, but the problem is: I need the resulting set to have no records with non_unique_id, that have already appeared in the list above the 'limiting timestamp'. This is how the result of the query should look like:

                       <- 'limiting timestamp'
18 | 871 | 1406181991

 9 | 764 | 1406178778
 7 | 609 | 1406178699

                       <- 'some timestamp in the past'

So the result will return 3 rows, which all have non_unique_id that did not appear in the results above. But if the 'non_unique_id' had already appeared in the list between 'limiting timestamp' and the 'some timestamp in the past', then only the first occurrence should be kept. Note: the last part condition is optional, as it will be pretty easy to extract the duplicate from the final list.

So far I was only able to come up with the solution of doing a JOIN between the list >= 'some timestamp in the past' and > 'limiting timestamp'. This way I'll see if there are any occurrences of the top list in the bottom list. However, it can be assumed that the query is complex and time needed to produce it's results shouldn't be doubled by running it again, but with a slightly different condition.

  • 写回答

2条回答 默认 最新

  • doumu2172 2014-07-24 18:31
    关注

    You can try this if SQL Server 2008+ (sqlfiddle: http://sqlfiddle.com/#!3/0bc33/3):

    WITH cteOrdered
    AS
    (
        SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp) AS RID,
                  t1.*
        FROM      Table1 t1 LEFT JOIN
                  (SELECT  Non_Unique_ID
                   FROM    Table1
                   WHERE   Timestamp < 1406177379 OR
                           Timestamp > 1406182577) t2
                  ON t1.Non_Unique_ID = t2.Non_Unique_ID
        WHERE     t2.Non_Unique_ID IS NULL AND
                  t1.Timestamp > 1406177379 AND 
                  t1.Timestamp < 1406182577
     )
    
    SELECT    Unique_ID,
              Non_Unique_ID,
              Timestamp
    FROM      cteOrdered
    WHERE     RID = 1;
    

    I've added another row into the data

    (18, 871, 1406181990),
    

    to see if the query was producing you want. You said if there are duplicate non_unique_id's within the search range only the "first" occurrence should be kept. I take this is the one with the EARLIEST timestamp? If opposite, you can change this line

    SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp) AS RID,
    

    to

    SELECT    ROW_NUMBER() OVER (PARTITION BY t1.Non_Unique_ID ORDER BY t1.Timestamp DESC) AS RID,
    

    and that will flip the order to retain the LATEST timestamp for the duplicates.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 merge函数占用内存过大
  • ¥15 Revit2020下载问题
  • ¥15 使用EMD去噪处理RML2016数据集时候的原理
  • ¥15 神经网络预测均方误差很小 但是图像上看着差别太大
  • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
  • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
  • ¥15 (标签-MATLAB|关键词-多址)