R语言:根据表1的两列值,删除表2中相关的行;将表2根据指定要求,按列拆分成两个表
一、当前目录有两个表:读取表为:ol、df,查看列名
> setwd("D:/Study/视频课/2024.4.20_医工科研/自-教-2-91炎症蛋白/2.去除Radial离群值")
> library(readxl)
> library(readr)
>
>
> # 读取Excel文件
> ol <- readxl::read_excel("RadialMR离群值唯一值(自动化).xlsx")
>
> # 读取CSV文件内容
> df <- read_csv("04.最终用于MR分析的工具变量数据.csv")
Rows: 266 Columns: 42
── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (12): SNP, effect_allele.exposure, other_allele.exposure, effect_allele.outcome, other_allele.outcome, id.outcome, outcome, pv...
dbl (22): beta.exposure, beta.outcome, eaf.exposure, eaf.outcome, se.outcome, pval.outcome, samplesize.outcome, chr.exposure, pos....
lgl (8): remove, palindromic, ambiguous, mr_keep.outcome, mr_keep, units.outcome, units.exposure, steiger_dir
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
>
> names(ol)
[1] "SNP" "Q_statistic" "p.value" "exposure" "method" "ID"
> names(df)
[1] "SNP" "effect_allele.exposure" "other_allele.exposure" "effect_allele.outcome" "other_allele.outcome"
[6] "beta.exposure" "beta.outcome" "eaf.exposure" "eaf.outcome" "remove"
[11] "palindromic" "ambiguous" "id.outcome" "se.outcome" "pval.outcome"
[16] "samplesize.outcome" "outcome" "mr_keep.outcome" "pval_origin.outcome" "data_source.outcome"
[21] "id.exposure" "chr.exposure" "pos.exposure" "se.exposure" "pval.exposure"
[26] "var_id" "samplesize.exposure" "exposure" "action" "mr_keep"
[31] "units.outcome" "units.exposure" "rsq.exposure" "effective_n.exposure" "rsq.outcome"
[36] "effective_n.outcome" "steiger_dir" "steiger_pval" "R2" "F"
[41] "FMean" "F_pow_beta_se"
>
二、将df表中, 符合条件【"SNP"列值=ol表的"SNP"列值 并且 "id.exposure"列值 = ol表的"exposure"列值】 的行,全部删除(已在excel筛查,ol共10行,df中都能查到,应在df表删除10行),生成新表df1
四、将df1拆分成两个新表ex、out:
1、凡是列名中包含“.exposure”的,全部归入ex表
2、凡是列名中包含“.outcome”的,全部归入out表
3、每个表都包含“SNP”列
4、ex表中包含"exposure"列
5、out表中包含“outcome”列
三、将ex、out表分别替换列名:
1、包含“effect_allele.*”的列名替换为“effect allele”:如ex表中的"effect_allele.exposure"列名,替换为“effect allele”,out表中的 "effect_allele.outcome" ,也替换为“effect allele”
2、包含“other_allele.*”的列名替换为“other allele”
3、包含“eaf.*”的列名替换为“eaf”
4、包含“beta.*”的列名替换为“beta”
5、包含“se.*”的列名替换为“se”
6、包含“pval.*”的列名替换为“pval”
7、包含“samplesize.*”的列名替换为“samplesize”
(2-7两个表都有,修改方法同1)
8、ex表中的"exposure"列名修改为“Phenotype”
9、out表中的“outcome”列名修改为“Phenotype”
四、修改out表中“Phenotype”列的列值为:"BMJ"
五、将ex、out表分别保存为同名的csv表,存在当前目录下