lrony* 2012-03-25 12:27 采纳率: 0%
浏览 256
已采纳

为什么'['比'子集'更好?

When I need to filter a data.frame, i.e., extract rows that meet certain conditions, I prefer to use the subset function:

subset(airquality, Month == 8 & Temp > 90)

Rather than the [ function:

airquality[airquality$Month == 8 & airquality$Temp > 90, ]

There are two main reasons for my preference:

  1. I find the code reads better, from left to right. Even people who know nothing about R could tell what the subset statement above is doing.

  2. Because columns can be referred to as variables in the select expression, I can save a few keystrokes. In my example above, I only had to type airquality once with subset, but three times with [.

So I was living happy, using subset everywhere because it is shorter and reads better, even advocating its beauty to my fellow R coders. But yesterday my world broke apart. While reading the subset documentation, I notice this section:

Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

Could someone help clarify what the authors mean?

First, what do they mean by "for use interactively"? I know what an interactive session is, as opposed to a script run in BATCH mode but I don't see what difference it should make.

Then, could you please explain "the non-standard evaluation of argument subset" and why it is dangerous, maybe provide an example?

转载于:https://stackoverflow.com/questions/9860090/why-is-better-than-subset

  • 写回答

2条回答 默认 最新

  • YaoRaoLov 2012-03-25 19:03
    关注

    This question was answered in well in the comments by @James, pointing to an excellent explanation by Hadley Wickham of the dangers of subset (and functions like it) [here]. Go read it!

    It's a somewhat long read, so it may be helpful to record here the example that Hadley uses that most directly addresses the question of "what can go wrong?":

    Hadley suggests the following example: suppose we want to subset and then reorder a data frame using the following functions:

    scramble <- function(x) x[sample(nrow(x)), ]
    
    subscramble <- function(x, condition) {
      scramble(subset(x, condition))
    }
    
    subscramble(mtcars, cyl == 4)
    

    This returns the error:

    Error in eval(expr, envir, enclos) : object 'cyl' not found

    because R no longer "knows" where to find the object called 'cyl'. He also points out the truly bizarre stuff that can happen if by chance there is an object called 'cyl' in the global environment:

    cyl <- 4
    subscramble(mtcars, cyl == 4)
    
    cyl <- sample(10, 100, rep = T)
    subscramble(mtcars, cyl == 4)
    

    (Run them and see for yourself, it's pretty crazy.)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 itunes恢复数据最后一步发生错误
  • ¥15 关于#windows#的问题:2024年5月15日的win11更新后资源管理器没有地址栏了顶部的地址栏和文件搜索都消失了
  • ¥15 看一下OPENMV原理图有没有错误
  • ¥100 H5网页如何调用微信扫一扫功能?
  • ¥15 讲解电路图,付费求解
  • ¥15 有偿请教计算电磁学的问题涉及到空间中时域UTD和FDTD算法结合的
  • ¥15 vite打包后,页面出现h.createElement is not a function,但本地运行正常
  • ¥15 Java,消息推送配置
  • ¥15 Java计划序号重编制功能,此功能会对所有序号重新排序,排序后不改变前后置关系。
  • ¥15 关于哈夫曼树应用得到一些问题