如何有效地计算一个正在运行的标准差？

I have an array of lists of numbers, e.g.:

[0] (0.01, 0.01, 0.02, 0.04, 0.03)
[1] (0.00, 0.02, 0.02, 0.03, 0.02)
[2] (0.01, 0.02, 0.02, 0.03, 0.02)
     ...
[n] (0.01, 0.00, 0.01, 0.05, 0.03)

What I would like to do is efficiently calculate the mean and standard deviation at each index of a list, across all array elements.

To do the mean, I have been looping through the array and summing the value at a given index of a list. At the end, I divide each value in my "averages list" by n.

To do the standard deviation, I loop through again, now that I have the mean calculated.

I would like to avoid going through the array twice, once for the mean and then once for the SD (after I have a mean).

Is there an efficient method for calculating both values, only going through the array once? Any code in an interpreted language (e.g. Perl or Python) or pseudocode is fine.

转载于:https://stackoverflow.com/questions/1174984/how-to-efficiently-calculate-a-running-standard-deviation

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

14条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
笑故挽风 2009-08-28 18:24
关注
The answer is to use Welford's algorithm, which is very clearly defined after the "naive methods" in:

Wikipedia: Algorithms for calculating variance

It's more numerically stable than either the two-pass or online simple sum of squares collectors suggested in other responses. The stability only really matters when you have lots of values that are close to each other as they lead to what is known as "catastrophic cancellation" in the floating point literature.

You might also want to brush up on the difference between dividing by the number of samples (N) and N-1 in the variance calculation (squared deviation). Dividing by N-1 leads to an unbiased estimate of variance from the sample, whereas dividing by N on average underestimates variance (because it doesn't take into account the variance between the sample mean and the true mean).

I wrote two blog entries on the topic which go into more details, including how to delete previous values online:

Computing Sample Mean and Variance Online in One Pass

Deleting Values in Welford’s Algorithm for Online Mean and Variance

You can also take a look at my Java implement; the javadoc, source, and unit tests are all online:

Javadoc: stats.OnlineNormalEstimator

Source: stats.OnlineNormalEstimator.java

JUnit Source: test.unit.stats.OnlineNormalEstimatorTest.java

LingPipe Home Page
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(13条)

报告相同问题？

关注问题

R语言基因相关性分析警告标准差为零，求大家帮帮我！哭了😭 r语言有问必答
2022-03-14 23:29

回答 2 已采纳关于第一个问题，出现警告信息与fpkm_t[i,]及y两个变量有关，其中一个变量可能值都相同，或者原数据就存在问题不适宜做相关分析，输出两个变量看看，并检查前面数据处理是否有问题。对于这种报错可以看看
这个计算标准差的写法怎么改不知道错哪了 c语言
2021-11-13 08:20

回答 2 已采纳 1、问题出在for语句，计算平均值ave应该在输入完所有数据之后，所以不能放在输入数据的for语句中，而标准差需要用到ave，所以也要放在for语句之外，另外再建一个for语句求各值与ave的差的和，
目前来看，学习哪门编程语言最有未来前景？(语言-开发语言) 学习方法开发语言蓝桥杯
2022-09-15 23:23

回答 5 已采纳您好，您孩子多大岁数呢？学习编程，兴趣最关键。。然后，要做好长期不断学习的心理准备。第一阶段：12岁前，岁数较小时，要学好数学，空余时间可以学一些少儿编程方面的资料，培养培育孩子的逻辑思维、数据思维能
怎么用python编写程序计算标准差_自学生信Python（第五天）|如何计算标准差？...
2021-01-11 23:08

今融道的博客如何计算标准差？本人是一枚生物学的学生，由于对生物信息学特别感兴趣，于是想自学生物信息学(新手莫怪)。了解到生物信息学要有编程基础，尤其是要会一门编程语言，例如：R语言、Python、Perl等，还要熟悉Linux系统...
JQ 时间计算天数差？ html javascript jquery
2022-09-27 14:34

回答 4 已采纳 (new Date('2022-09-23') - new Date('2022-09-21')) / 1000 / 60/ 60 / 24
计算基本数值（平均值、标准差、中位数） python
2022-04-10 11:15

回答 1 已采纳 import numpy as np m=input() ls=[] ls=m.split(",") for i in range(len(ls)): ls[i] = float(ls[i])
计算标准差，输入多行，以空行结束输入 python
2022-06-18 23:01

回答 1 已采纳 from math import sqrt def getnum(): nums=[] inumstr=input("请输入数字(直接输入回车退出):") while inu
R语言使用sd函数计算向量数据的标准差
2022-04-24 07:28

sdgfbhgfj的博客 R语言使用sd函数计算向量数据的标准差
2023十大最牛编程语言排行榜以及各语言的优缺点
2023-07-27 06:00

哈哥撩编程的博客我们掌握不了所有的编程语言，但我们掌握的语言越多，在未来的发展与可塑性上就越强，就越容易受到企业的青睐，如果是作为自由开发者的话，也就越会有客户和开发团队与我们合作。考虑到这一点，希望各位小伙伴能...
编程语言发展史之：编程语言与量子计算
2023-09-25 01:18

AI天才研究院的博客在探索新的计算方式时，工程师们需要掌握一些编程语言知识，例如掌握哪些编程语言比较适合量子计算相关的任务。由于我国人工智能领域的蓬勃发展，计算机技术日渐成熟。本文将介绍现代编程语言发展历史、语言之间的...
编程语言发展史之：编程语言的未来趋势
2023-09-25 01:00

AI天才研究院的博客 编程语言”这个概念在近几年间已经成为现代科技领域的一个热门话题。它从诞生到今天已经经历了几百年的历史，各个编程语言都各不相同，但其中的共同点无疑就是可以实现一些程序功能。而“未来趋势”，则指的是这一...
【编程实践】编程语言之 Smalltalk
2023-04-01 12:31

AI天才研究院的博客 Smalltalk，被公认为历史上第二个面向对象的程序设计语言，和第一个真正的集成开发环境（IDE）。Smalltalk由艾伦·凯，Dan Ingalls，Ted Kaehler，Adele Goldberg等于70年代初在Xerox PARC开发。Smalltalk对其它众多...
2022年编程语言排名，官方数据来了，让人大开眼界。
2022-01-09 01:32

_金欣的博客之所以说这件事，就是想告诉同学们，努力固然重要，但选择必须要对，一旦选择错了，那很多努力都是白费。这篇文章就是来给同学们提个醒，2022 年最好的编程语言是什么？看完后你就知道该如何地去选择了。 ......
没有解决我的问题, 去提问

悬赏问题

¥30 Matlab打开默认名称带有/的光谱数据
¥50 easyExcel模板动态单元格合并列
¥15 res.rows如何取值使用
¥15 在odoo17开发环境中，怎么实现库存管理系统，或独立模块设计与AGV小车对接？开发方面应如何设计和开发？请详细解释MES或WMS在与AGV小车对接时需完成的设计和开发
¥15 CSP算法实现EEG特征提取，哪一步错了？
¥15 游戏盾如何溯源服务器真实ip?需要30个字。后面的字是凑数的
¥15 vue3前端取消收藏的不会引用collectId
¥15 delphi7 HMAC_SHA256方式加密
¥15 关于#qt#的问题：我想实现qcustomplot完成坐标轴
¥15 下列c语言代码为何输出了多余的空格

如何有效地计算一个正在运行的标准差？

14条回答 默认 最新

悬赏问题

14条回答默认最新