将函数(tapply，by，aggregate)和 * 应用程序族分组

Whenever I want to do something "map"py in R, I usually try to use a function in the apply family.

However, I've never quite understood the differences between them -- how {sapply, lapply, etc.} apply the function to the input/grouped input, what the output will look like, or even what the input can be -- so I often just go through them all until I get what I want.

Can someone explain how to use which one when?

My current (probably incorrect/incomplete) understanding is...

sapply(vec, f): input is a vector. output is a vector/matrix, where element i is f(vec[i]), giving you a matrix if f has a multi-element output
lapply(vec, f): same as sapply, but output is a list?
apply(matrix, 1/2, f): input is a matrix. output is a vector, where element i is f(row/col i of the matrix)
tapply(vector, grouping, f): output is a matrix/array, where an element in the matrix/array is the value of f at a grouping g of the vector, and g gets pushed to the row/col names
by(dataframe, grouping, f): let g be a grouping. apply f to each column of the group/dataframe. pretty print the grouping and the value of f at each column.
aggregate(matrix, grouping, f): similar to by, but instead of pretty printing the output, aggregate sticks everything into a dataframe.

Side question: I still haven't learned plyr or reshape -- would plyr or reshape replace all of these entirely?

转载于:https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

9条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
胖鸭 2011-08-21 22:50
关注
R has many *apply functions which are ably described in the help files (e.g. ?apply). There are enough of them, though, that beginning useRs may have difficulty deciding which one is appropriate for their situation or even remembering them all. They may have a general sense that "I should be using an *apply function here", but it can be tough to keep them all straight at first.

Despite the fact (noted in other answers) that much of the functionality of the *apply family is covered by the extremely popular plyr package, the base functions remain useful and worth knowing.

This answer is intended to act as a sort of signpost for new useRs to help direct them to the correct *apply function for their particular problem. Note, this is not intended to simply regurgitate or replace the R documentation! The hope is that this answer helps you to decide which *apply function suits your situation and then it is up to you to research it further. With one exception, performance differences will not be addressed.

apply - When you want to apply a function to the rows or columns of a matrix (and higher-dimensional analogues); not generally advisable for data frames as it will coerce to a matrix first.

# Two dimensional matrix M <- matrix(seq(1,16), 4, 4) # apply min to rows apply(M, 1, min) [1] 1 2 3 4 # apply max to columns apply(M, 2, max) [1] 4 8 12 16 # 3 dimensional array M <- array( seq(32), dim = c(4,4,2)) # Apply sum across each M[*, , ] - i.e Sum across 2nd and 3rd dimension apply(M, 1, sum) # Result is one-dimensional [1] 120 128 136 144 # Apply sum across each M[*, *, ] - i.e Sum across 3rd dimension apply(M, c(1,2), sum) # Result is two-dimensional [,1] [,2] [,3] [,4] [1,] 18 26 34 42 [2,] 20 28 36 44 [3,] 22 30 38 46 [4,] 24 32 40 48

If you want row/column means or sums for a 2D matrix, be sure to investigate the highly optimized, lightning-quick colMeans, rowMeans, colSums, rowSums.

lapply - When you want to apply a function to each element of a list in turn and get a list back.

This is the workhorse of many of the other *apply functions. Peel back their code and you will often find lapply underneath.

x <- list(a = 1, b = 1:3, c = 10:100) lapply(x, FUN = length) $a [1] 1 $b [1] 3 $c [1] 91 lapply(x, FUN = sum) $a [1] 1 $b [1] 6 $c [1] 5005

sapply - When you want to apply a function to each element of a list in turn, but you want a vector back, rather than a list.

If you find yourself typing unlist(lapply(...)), stop and consider sapply.

x <- list(a = 1, b = 1:3, c = 10:100) # Compare with above; a named vector, not a list sapply(x, FUN = length) a b c 1 3 91 sapply(x, FUN = sum) a b c 1 6 5005

In more advanced uses of sapply it will attempt to coerce the result to a multi-dimensional array, if appropriate. For example, if our function returns vectors of the same length, sapply will use them as columns of a matrix:

sapply(1:5,function(x) rnorm(3,x))

If our function returns a 2 dimensional matrix, sapply will do essentially the same thing, treating each returned matrix as a single long vector:

sapply(1:5,function(x) matrix(x,2,2))

Unless we specify simplify = "array", in which case it will use the individual matrices to build a multi-dimensional array:

sapply(1:5,function(x) matrix(x,2,2), simplify = "array")

Each of these behaviors is of course contingent on our function returning vectors or matrices of the same length or dimension.

vapply - When you want to use sapply but perhaps need to squeeze some more speed out of your code.

For vapply, you basically give R an example of what sort of thing your function will return, which can save some time coercing returned values to fit in a single atomic vector.

x <- list(a = 1, b = 1:3, c = 10:100) #Note that since the advantage here is mainly speed, this # example is only for illustration. We're telling R that # everything returned by length() should be an integer of # length 1. vapply(x, FUN = length, FUN.VALUE = 0L) a b c 1 3 91

mapply - For when you have several data structures (e.g. vectors, lists) and you want to apply a function to the 1st elements of each, and then the 2nd elements of each, etc., coercing the result to a vector/array as in sapply.

This is multivariate in the sense that your function must accept multiple arguments.

#Sums the 1st elements, the 2nd elements, etc. mapply(sum, 1:5, 1:5, 1:5) [1] 3 6 9 12 15 #To do rep(1,4), rep(2,3), etc. mapply(rep, 1:4, 4:1) [[1]] [1] 1 1 1 1 [[2]] [1] 2 2 2 [[3]] [1] 3 3 [[4]] [1] 4

Map - A wrapper to mapply with SIMPLIFY = FALSE, so it is guaranteed to return a list.

Map(sum, 1:5, 1:5, 1:5) [[1]] [1] 3 [[2]] [1] 6 [[3]] [1] 9 [[4]] [1] 12 [[5]] [1] 15

rapply - For when you want to apply a function to each element of a nested list structure, recursively.

To give you some idea of how uncommon rapply is, I forgot about it when first posting this answer! Obviously, I'm sure many people use it, but YMMV. rapply is best illustrated with a user-defined function to apply:

# Append ! to string, otherwise increment myFun <- function(x){ if(is.character(x)){ return(paste(x,"!",sep="")) } else{ return(x + 1) } } #A nested list structure l <- list(a = list(a1 = "Boo", b1 = 2, c1 = "Eeek"), b = 3, c = "Yikes", d = list(a2 = 1, b2 = list(a3 = "Hey", b3 = 5))) # Result is named vector, coerced to character rapply(l, myFun) # Result is a nested list like l, with values altered rapply(l, myFun, how="replace")

tapply - For when you want to apply a function to subsets of a vector and the subsets are defined by some other vector, usually a factor.

The black sheep of the *apply family, of sorts. The help file's use of the phrase "ragged array" can be a bit confusing, but it is actually quite simple.

A vector:

x <- 1:20

A factor (of the same length!) defining groups:

y <- factor(rep(letters[1:5], each = 4))

Add up the values in x within each subgroup defined by y:

tapply(x, y, sum) a b c d e 10 26 42 58 74

More complex examples can be handled where the subgroups are defined by the unique combinations of a list of several factors. tapply is similar in spirit to the split-apply-combine functions that are common in R (aggregate, by, ave, ddply, etc.) Hence its black sheep status.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(8条)

报告相同问题？

关注问题

将函数(tapply，by，aggregate)和 * 应用程序族分组 r语言
2010-08-17 18:31

回答 9 已采纳 R has many *apply functions which are ably described in the help files (e.g. ?apply). There are en
小程序云函数match提示删除不必要的await javascript 微信小程序
2022-10-17 22:36

回答 3 已采纳好像不用管它，也没问题了，选择忽略
小程序云开发数据库异步问题 javascript 小程序数据库有问必答
2023-04-20 15:20

回答 3 已采纳原因在于你的上下午promise异步了，修改成这样： Promise.all(promises).then(() => { db.collection("leave").where({_id
第一章 R语言编程基础（超详细）
2022-08-23 10:21

村里小公举的博客可查看改函数、或查看提供有关函数的程序包，搜索原因解释。：两个长度不匹配的向量运算，重复较短的向量，与长度较长的向量匹配。列表可以类比为异质的一维度向量，数据框可以类比为异质的二维矩阵。遇到不同数据...
groupby和sum或mean的用法 python
2021-10-15 10:37

回答 1 已采纳 groupby 后面不是应该列名就行么，需要取数data[列名]？
Mongo怎么限制每个分组下返回的文档数？
2017-03-07 03:40

回答 3 已采纳这个问题还是我自己来吧，我后来的解决方案是这样的。先查询出所有的首字母的集合，然后分别用每个首字母去查询出前五个联系人。最终得到的一条记录是这样的一个首字母字段initial后面跟着首字母。一个c
link中aggregate有什么用，和selectmany的区别是什么？
2015-01-07 02:52

回答 9 已采纳我把concat也实现了，请看 ``` using System; using System.Collections.Generic; using System.Linq; using
《R语言编程艺术》书上代码实现---第六章因子和表
2022-10-05 20:53

退堂鼓界的表演艺术家的博客【代码】《R语言编程艺术》书上代码实现---第六章因子和表。
按键对地图/结构进行分组，然后对数组的关联值求和
2019-06-10 09:34

回答 1 已采纳 Is something like this what your looking for? package main import "fmt" type WordCountStruct st
r语言aggredate_R语言：数据的分割-计算-整合（split-apply-aggregate）
2020-12-19 07:48

weixin_39981632的博客当获取到原始数据时，我们通常的做法是对该数据进行分割成小片段，然后对各小片段进行...计算：apply()，lapply()，tapply()，sapply()整合：aggregate()注意几点：1.向量，矩阵，数组的长度就是它元素的个数，用...
怎么学好python和r语言_如何高效地学好R语言?
2020-11-28 01:15

weixin_39992312的博客学R语言主要在于5点三阶段：第一阶段有一点：基础的文件操作(read.*,write.*)、数据结构知识，认识什么是数据框(data.frame)、列表(list)、矩阵(matrix)、向量(vector)，如何提取(包括which,[ ]等)、置换(t, matrix...
我的新书《R语言数据分析、挖掘建模和可视化》出版上市啦！
2021-01-09 12:00

Sim1480的博客出发点2018年年初开始了处女作的编写，并在当年10月上线《从零开始学Python数据分析与挖掘》。在编写处女作的同时也在想另一件事，Python更多的应用于企业界，而教育领域的统计学专...
r 选取从小到大的数据_R语言（常用函数与数据管理）
2020-12-21 17:22

weixin_39870155的博客也包括 apply() 这种函数式编程函数的使用。数学函数数学运算符和一些统计学上需要的函数。数学运算符四则幂运算求余整除+, -, *, /^ 或 **%%%/%例子：a[1] 8 1 2基本数学函数绝对值：abs()平方根：s...
R语言（常用函数与数据管理）
2019-09-25 11:04

大数据技术派的博客原文链接：https://wklchris.github.io/R-manage-data.html本节内容可应用在数据读取之后。包括基本的运算（包括统计函数）、数据重整...
R 语言编程艺术笔记
2017-01-31 17:22

张博208的博客 by函数，应用的对象不止向量，而tapply只能向量 aba("alaone.data",header=T) by(aba,aba$Gender,function(m) lm(m[,2]~m[,3])) u(22,8,33,6,8,29,-2) fl(c(5,12,13,12,13,5,13),c("a","bc","a",...
R语言编程艺术
2019-07-07 02:37

weixin_30955617的博客《R语言编程艺术》基本信息原书名：The Art of R Programming：A Tour of Statistical Software Design 作者： (美)Norman Matloff 译者：陈堰平邱怡轩潘岚锋熊熹丛书名：华章程序员书库出版社：机械...
生信人的20个R语言习题
2022-05-26 16:53

皮肤小白生的博客生信人的20个R语言习题题目原文：http://www.bio-info-trainee.com/3409.html 参考答案：https://www.jianshu.com/p/dd4e285665e1 https://www.jianshu.com/p/dd4e285665e1 参考答案：...
R语言知识点整理
2023-08-02 16:52

小孔不爱coding的博客 R语言知识点整理 R语言复习资料
R语言编程入门--replicate()函数比较有意思!
2014-02-12 10:49

Hookee的博客 I. 导论简单来讲，编程是借助计算机来解决某个问题。学习编程的就是训练我们解决问题的...这时候R语言编程就会派上用场。例如从大的方面来看，投资方要决定在何处建立风力发电场，就需要采集天气数据加以建模分析
没有解决我的问题, 去提问

悬赏问题

¥20 有关区间dp的问题求解
¥15 多电路系统共用电源的串扰问题
¥15 slam rangenet++配置
¥15 有没有研究水声通信方面的帮我改俩matlab代码
¥15 对于相关问题的求解与代码
¥15 ubuntu子系统密码忘记
¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
¥15 保护模式-系统加载-段寄存器
¥15 电脑桌面设定一个区域禁止鼠标操作
¥15 求NPF226060磁芯的详细资料

将函数(tapply，by，aggregate)和 * 应用程序族分组

9条回答 默认 最新

悬赏问题

9条回答默认最新