weixin_39682511
weixin_39682511
2020-12-09 09:54

TAJO-921: Add STDDEV_SAMP and STDDEV_POP window functions

Implementing STDDEV_SAMP() and STDDEV_POP() as aggregation funciton. - Calculation is based on the method first proposed by B. P. Welford in 1962 and described by Donald Knuth in Art of Computer Programming, Volume 2: Seminumerical Algorithms. (You can also check http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm for basic explanation) - I also changes names of test methods for window functions (for example, from rowNumber1 to testRowNumber1)

I checked 'mvn clean install' passed with this patch.

该提问来源于开源项目:apache/tajo

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

7条回答

  • weixin_39608398 weixin_39608398 5月前

    I'll review today.

    点赞 评论 复制链接分享
  • weixin_39608398 weixin_39608398 5月前

    Hi , thanks for your patch. I executed the same queries in newly added unit tests with pgsql, but got different results. Would you check it? Here are the results of pgsql. - result of testStdDevPop1

    
     linenumber_stddev_pop  | suppkey_stddev_pop | extendedprice_stddev_pop | discount_stddev_pop 
    ------------------------+--------------------+--------------------------+---------------------
                          0 |                  0 |                        0 |                   0
     0.50000000000000000000 |   197.500000000000 |                12407.465 |  0.0250000022351742
                          0 |                  0 |                        0 |                   0
                          0 |                  0 |                        0 |                   0
     0.50000000000000000000 |  2371.000000000000 |         3630.79000000004 |  0.0200000014156103
    (5 rows)
    
    • result of testStdDevSamp1
    
     linenumber_stddev_samp | suppkey_stddev_samp | extendedprice_stddev_samp | discount_stddev_samp 
    ------------------------+---------------------+---------------------------+----------------------
                            |                     |                           |                     
     0.70710678118654752440 |    279.307178568686 |          17546.8052776695 |    0.035355342220341
                            |                     |                           |                     
                            |                     |                           |                     
     0.70710678118654752440 |   3353.100356386608 |          5134.71246012867 |   0.0282842732494372
    (5 rows)
    
    点赞 评论 复制链接分享
  • weixin_39682511 weixin_39682511 5月前

    Thank you for the comment,
    I'll check about it.

    点赞 评论 复制链接分享
  • weixin_39682511 weixin_39682511 5月前

    Hi,
    Above difference is from the handling of default window frame value. In Tajo, current window functions work as if window frame is set as "rows between unbounded preceding and unbounded following" while default setting of all other DBMSs is "range between unbounded preceding and current row"

    I already posted patch to support correct window frame as TAJO-1415. So, after that patch, the result will be the same. Or you can test with window frame setting of "rows between unbounded preceding and unbounded following" for all window functions.

    点赞 评论 复制链接分享
  • weixin_39608398 weixin_39608398 5月前

    thanks for quick response. I'll finish my review soon.

    点赞 评论 复制链接分享
  • weixin_39608398 weixin_39608398 5月前

    +1 Thanks for your nice work!

    点赞 评论 复制链接分享
  • weixin_39682511 weixin_39682511 5月前

    Thank you for the review,

    I just committed the patch to master.

    点赞 评论 复制链接分享

相关推荐