xiaotu9316 2024-06-14 09:43 采纳率: 88.6%

已结题

R语言：根据表1的两列值，删除表2中相关的行；将表2根据指定要求，按列拆分成两个表

一、当前目录有两个表：读取表为：ol、df，查看列名

> setwd("D:/Study/视频课/2024.4.20_医工科研/自-教-2-91炎症蛋白/2.去除Radial离群值")
> library(readxl)
> library(readr)
> 
> 
> # 读取Excel文件
> ol <- readxl::read_excel("RadialMR离群值唯一值(自动化).xlsx")
> 
> # 读取CSV文件内容
> df <- read_csv("04.最终用于MR分析的工具变量数据.csv")
Rows: 266 Columns: 42                                                                                                                 
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (12): SNP, effect_allele.exposure, other_allele.exposure, effect_allele.outcome, other_allele.outcome, id.outcome, outcome, pv...
dbl (22): beta.exposure, beta.outcome, eaf.exposure, eaf.outcome, se.outcome, pval.outcome, samplesize.outcome, chr.exposure, pos....
lgl  (8): remove, palindromic, ambiguous, mr_keep.outcome, mr_keep, units.outcome, units.exposure, steiger_dir

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
> 

> names(ol)
[1] "SNP"         "Q_statistic" "p.value"     "exposure"    "method"      "ID"         

> names(df)
 [1] "SNP"                    "effect_allele.exposure" "other_allele.exposure"  "effect_allele.outcome"  "other_allele.outcome"  
 [6] "beta.exposure"          "beta.outcome"           "eaf.exposure"           "eaf.outcome"            "remove"                
[11] "palindromic"            "ambiguous"              "id.outcome"             "se.outcome"             "pval.outcome"          
[16] "samplesize.outcome"     "outcome"                "mr_keep.outcome"        "pval_origin.outcome"    "data_source.outcome"   
[21] "id.exposure"            "chr.exposure"           "pos.exposure"           "se.exposure"            "pval.exposure"         
[26] "var_id"                 "samplesize.exposure"    "exposure"               "action"                 "mr_keep"               
[31] "units.outcome"          "units.exposure"         "rsq.exposure"           "effective_n.exposure"   "rsq.outcome"           
[36] "effective_n.outcome"    "steiger_dir"            "steiger_pval"           "R2"                     "F"                     
[41] "FMean"                  "F_pow_beta_se"         
>

二、将df表中，符合条件【"SNP"列值=ol表的"SNP"列值并且 "id.exposure"列值 = ol表的"exposure"列值】的行，全部删除（已在excel筛查，ol共10行，df中都能查到，应在df表删除10行），生成新表df1

四、将df1拆分成两个新表ex、out：
1、凡是列名中包含“.exposure”的，全部归入ex表
2、凡是列名中包含“.outcome”的，全部归入out表
3、每个表都包含“SNP”列
4、ex表中包含"exposure"列
5、out表中包含“outcome”列

三、将ex、out表分别替换列名：
1、包含“effect_allele.*”的列名替换为“effect allele”：如ex表中的"effect_allele.exposure"列名，替换为“effect allele”，out表中的 "effect_allele.outcome" ，也替换为“effect allele”
2、包含“other_allele.*”的列名替换为“other allele”
3、包含“eaf.*”的列名替换为“eaf”
4、包含“beta.*”的列名替换为“beta”
5、包含“se.*”的列名替换为“se”
6、包含“pval.*”的列名替换为“pval”
7、包含“samplesize.*”的列名替换为“samplesize”
（2-7两个表都有，修改方法同1）
8、ex表中的"exposure"列名修改为“Phenotype”
9、out表中的“outcome”列名修改为“Phenotype”

四、修改out表中“Phenotype”列的列值为："BMJ"

五、将ex、out表分别保存为同名的csv表，存在当前目录下

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

4条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
专家-赤兔[在线] 优质创作者: 编程框架技术领域 2024-06-14 09:43
关注
引自免费微信小程序：皆我百晓生

为了完成上述任务，我们可以按照以下步骤操作：

步骤一：从df中删除符合条件的行

首先，我们需要找到df中与ol表中SNP和exposure列匹配的行，并从df中删除这些行。

# 寻找匹配并删除 df1 <- df[!(paste(df$SNP, df$id.exposure) %in% paste(ol$SNP, ol$exposure)), ]

步骤二：将df1拆分成ex和out表

接下来，我们基于列名中是否包含.exposure或.outcome来拆分df1。

# 拆分df1为ex和out ex_cols <- grep(".exposure$", names(df1), value = TRUE) out_cols <- grep(".outcome$", names(df1), value = TRUE) ex <- df1[, c("SNP", ex_cols, "exposure")] colnames(ex)[grep("exposure$", colnames(ex))] <- "Phenotype" out <- df1[, c("SNP", out_cols, "outcome")] colnames(out)[grep("outcome$", colnames(out))] <- "Phenotype"

步骤三：修改列名

对于ex和out表中的列名进行统一命名。

rename_columns <- function(df) { df <- rename_with(df, ~ifelse(grepl("effect_allele", .x), "effect allele", .x), ~ifelse(grepl("other_allele", .x), "other allele", .x), ~ifelse(grepl("eaf", .x), "eaf", .x), ~ifelse(grepl("beta", .x), "beta", .x), ~ifelse(grepl("se", .x), "se", .x), ~ifelse(grepl("pval", .x), "pval", .x), ~ifelse(grepl("samplesize", .x), "samplesize", .x)) return(df) } ex <- rename_columns(ex) out <- rename_columns(out)

步骤四：修改out表中的“Phenotype”列值

out$Phenotype <- "BMJ"

步骤五：保存ex和out为CSV文件

write_csv(ex, "ex.csv") write_csv(out, "out.csv")

以上步骤将完成所有请求的操作，确保了从df中删除特定行，并将剩余的数据拆分到两个表中，同时对列名进行了标准化处理，并将结果保存为CSV文件。
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决 1
无用
评论打赏
分享
举报编辑记录

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(3条)

报告相同问题？

关注问题

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
系统已结题 6月22日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
已采纳回答 6月14日
关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 6月14日

码龄粉丝数原力等级 --

R语言：根据表1的两列值，删除表2中相关的行；将表2根据指定要求，按列拆分成两个表

4条回答默认最新

码龄粉丝数原力等级 --

步骤一：从`df`中删除符合条件的行

步骤二：将`df1`拆分成`ex`和`out`表

步骤三：修改列名

步骤四：修改`out`表中的“Phenotype”列值

步骤五：保存`ex`和`out`为CSV文件

问题事件

码龄粉丝数原力等级 --

码龄粉丝数原力等级 --

码龄粉丝数原力等级 --

R语言：根据表1的两列值，删除表2中相关的行；将表2根据指定要求，按列拆分成两个表

4条回答 默认 最新

步骤一：从df中删除符合条件的行

步骤二：将df1拆分成ex和out表

步骤三：修改列名

步骤四：修改out表中的“Phenotype”列值

步骤五：保存ex和out为CSV文件

问题事件

4条回答默认最新

步骤一：从`df`中删除符合条件的行

步骤二：将`df1`拆分成`ex`和`out`表

步骤四：修改`out`表中的“Phenotype”列值

步骤五：保存`ex`和`out`为CSV文件