斗士狗 2009-07-23 23:09 采纳率: 0%
浏览 220
已采纳

如何有效地计算一个正在运行的标准差?

I have an array of lists of numbers, e.g.:

[0] (0.01, 0.01, 0.02, 0.04, 0.03)
[1] (0.00, 0.02, 0.02, 0.03, 0.02)
[2] (0.01, 0.02, 0.02, 0.03, 0.02)
     ...
[n] (0.01, 0.00, 0.01, 0.05, 0.03)

What I would like to do is efficiently calculate the mean and standard deviation at each index of a list, across all array elements.

To do the mean, I have been looping through the array and summing the value at a given index of a list. At the end, I divide each value in my "averages list" by n.

To do the standard deviation, I loop through again, now that I have the mean calculated.

I would like to avoid going through the array twice, once for the mean and then once for the SD (after I have a mean).

Is there an efficient method for calculating both values, only going through the array once? Any code in an interpreted language (e.g. Perl or Python) or pseudocode is fine.

转载于:https://stackoverflow.com/questions/1174984/how-to-efficiently-calculate-a-running-standard-deviation

  • 写回答

14条回答 默认 最新

  • 三生石@ 2009-08-28 18:24
    关注

    The answer is to use Welford's algorithm, which is very clearly defined after the "naive methods" in:

    It's more numerically stable than either the two-pass or online simple sum of squares collectors suggested in other responses. The stability only really matters when you have lots of values that are close to each other as they lead to what is known as "catastrophic cancellation" in the floating point literature.

    You might also want to brush up on the difference between dividing by the number of samples (N) and N-1 in the variance calculation (squared deviation). Dividing by N-1 leads to an unbiased estimate of variance from the sample, whereas dividing by N on average underestimates variance (because it doesn't take into account the variance between the sample mean and the true mean).

    I wrote two blog entries on the topic which go into more details, including how to delete previous values online:

    You can also take a look at my Java implement; the javadoc, source, and unit tests are all online:

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(13条)

报告相同问题?

悬赏问题

  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥30 python代码,帮调试,帮帮忙吧
  • ¥15 #MATLAB仿真#车辆换道路径规划