10.24
2010-05-17 16:52
采纳率: 0%
浏览 1.3k

将 data.frame 列从乘数转换为字符

I have a data frame. Let's call him bob:

> head(bob)
                 phenotype                         exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-

I'd like to concatenate the rows of this data frame (this will be another question). But look:

> class(bob$phenotype)
[1] "factor"

Bob's columns are factors. So, for example:

> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)"       "c(3, 3, 3, 3, 3, 3)"      
[3] "c(29, 29, 29, 30, 30, 30)"

I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob? Not what I need.

Strangely I can go through the columns of bob by hand, and do

bob$phenotype <- as.character(bob$phenotype)

which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?

Bonus question: why does the manual approach work?

转载于:https://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters

  • 写回答
  • 好问题 提建议
  • 关注问题
  • 收藏
  • 邀请回答

13条回答 默认 最新

  • 狐狸.fox 2010-05-17 17:21
    已采纳

    Just following on Matt and Dirk. If you want to recreate your existing data frame without changing the global option, you can recreate it with an apply statement:

    bob <- data.frame(lapply(bob, as.character), stringsAsFactors=FALSE)
    

    This will convert all variables to class "character", if you want to only convert factors, see Marek's solution below.

    As @hadley points out, the following is more concise.

    bob[] <- lapply(bob, as.character)
    

    In both cases, lapply outputs a list; however, owing to the magical properties of R, the use of [] in the second case keeps the data.frame class of the bob object, thereby eliminating the need to convert back to a data.frame using as.data.frame with the argument stringsAsFactors = FALSE.

    已采纳该答案
    评论
    解决 无用
    打赏 举报
  • ℡Wang Yan 2010-05-17 17:00

    Update: Here's an example of something that doesn't work. I thought it would, but I think that the stringsAsFactors option only works on character strings - it leaves the factors alone.

    Try this:

    bob2 <- data.frame(bob, stringsAsFactors = FALSE)
    

    Generally speaking, whenever you're having problems with factors that should be characters, there's a stringsAsFactors setting somewhere to help you (including a global setting).

    评论
    解决 无用
    打赏 举报
  • ℡Wang Yan 2010-05-17 17:02

    The global option

    stringsAsFactors: The default setting for arguments of data.frame and read.table.

    may be something you want to set to FALSE in your startup files (e.g. ~/.Rprofile). Please see help(options).

    评论
    解决 无用
    打赏 举报
  • 胖鸭 2010-05-17 17:15

    Another way is to convert it using apply

    bob2 <- apply(bob,2,as.character)
    

    And a better one (the previous is of class 'matrix')

    bob2 <- as.data.frame(as.matrix(bob),stringsAsFactors=F)
    
    评论
    解决 无用
    打赏 举报
  • 妄徒之命 2010-05-17 17:49

    Or you can try transform:

    newbob <- transform(bob, phenotype = as.character(phenotype))
    

    Just be sure to put every factor you'd like to convert to character.

    Or you can do something like this and kill all the pests with one blow:

    newbob_char <- as.data.frame(lapply(bob[sapply(bob, is.factor)], as.character), stringsAsFactors = FALSE)
    newbob_rest <- bob[!(sapply(bob, is.factor))]
    newbob <- cbind(newbob_char, newbob_rest)
    

    It's not good idea to shove the data in code like this, I could do the sapply part separately (actually, it's much easier to do it like that), but you get the point... I haven't checked the code, 'cause I'm not at home, so I hope it works! =)

    This approach, however, has a downside... you must reorganize columns afterwards, while with transform you can do whatever you like, but at cost of "pedestrian-style-code-writting"...

    So there... =)

    评论
    解决 无用
    打赏 举报
  • 三生石@ 2010-05-17 22:08

    To replace only factors:

    i <- sapply(bob, is.factor)
    bob[i] <- lapply(bob[i], as.character)
    

    In package dplyr in version 0.5.0 new function mutate_if was introduced:

    library(dplyr)
    bob %>% mutate_if(is.factor, as.character) -> bob
    

    Package purrr from RStudio gives another alternative:

    library(purrr)
    library(dplyr)
    bob %>% map_if(is.factor, as.character) %>% as_data_frame -> bob
    

    (keep in mind it's fresh package)

    评论
    解决 无用
    打赏 举报
  • 叼花硬汉 2012-01-05 06:04

    If you want a new data frame bobc where every factor vector in bobf is converted to a character vector, try this:

    bobc <- rapply(bobf, as.character, classes="factor", how="replace")
    

    If you then want to convert it back, you can create a logical vector of which columns are factors, and use that to selectively apply factor

    f <- sapply(bobf, class) == "factor"
    bobc[,f] <- lapply(bobc[,f], factor)
    
    评论
    解决 无用
    打赏 举报
  • 喵-见缝插针 2013-01-10 22:25

    I typically make this function apart of all my projects. Quick and easy.

    unfactorize <- function(df){
      for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
      return(df)
    }
    
    评论
    解决 无用
    打赏 举报
  • YaoRaoLov 2013-03-21 17:40

    I know this answer is a bit late, but if you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.

    Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:

    > fact <- as.factor(c("a","b","a","d")
    > fact
    [1] a b a d
    Levels: a b d
    
    > as.numeric(fact)
    [1] 1 2 1 3
    

    The numbers returned in the last line correspond to the levels of the factor.

    > levels(fact)
    [1] "a" "b" "d"
    

    Notice that levels() returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:

    > fact_character <- levels(fact)[as.numeric(fact)]
    > fact_character
    [1] "a" "b" "a" "d"
    

    This also works for numeric values, provided you wrap your expression in as.numeric().

    > num_fact <- factor(c(1,2,3,6,5,4))
    > num_fact
    [1] 1 2 3 6 5 4
    Levels: 1 2 3 4 5 6
    > num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
    > num_num
    [1] 1 2 3 6 5 4
    
    评论
    解决 无用
    打赏 举报
  • 撒拉嘿哟木头 2014-10-24 16:00

    This works for me - I finally figured a one liner

    df <- as.data.frame(lapply(df,function (y) if(class(y)=="factor" ) as.character(y) else y),stringsAsFactors=F)
    
    评论
    解决 无用
    打赏 举报
  • ℙℕℤℝ 2015-12-09 20:55

    If you would use data.table package for the operations on data.frame then the problem is not present.

    library(data.table)
    dt = data.table(col1 = c("a","b","c"), col2 = 1:3)
    sapply(dt, class)
    #       col1        col2 
    #"character"   "integer" 
    

    If you have a factor columns in you dataset already and you want to convert them to character you can do the following.

    library(data.table)
    dt = data.table(col1 = factor(c("a","b","c")), col2 = 1:3)
    sapply(dt, class)
    #     col1      col2 
    # "factor" "integer" 
    upd.cols = sapply(dt, is.factor)
    dt[, names(dt)[upd.cols] := lapply(.SD, as.character), .SDcols = upd.cols]
    sapply(dt, class)
    #       col1        col2 
    #"character"   "integer" 
    
    评论
    解决 无用
    打赏 举报
  • 乱世@小熊 2016-01-16 15:21

    At the beginning of your data frame include stringsAsFactors = FALSE to ignore all misunderstandings.

    评论
    解决 无用
    打赏 举报
  • csdnceshi62 2017-11-13 16:46

    This function does the trick

    df <- stacomirtools::killfactor(df)
    
    评论
    解决 无用
    打赏 举报

相关推荐 更多相似问题