R语言编程问题
这些数据全部集中在数据框的一列当中,但是我现在只需要 Dbxref=GeneID:数字 这个数据,请问该如何将其提取并输出出来
R语言编程问题
可以尝试使用正则表达式来实现
1.for循环遍历
# 创建示例数据
data <- c("ID=CD266144.1:1", "Name=CD266144.1", "Dbxref=GeneID:101095186", "gbkey=Src", "chromosome=X",
"ID=CD266144.1:2", "Name=CD266144.1", "Dbxref=GeneID:101095187", "gbkey=Src", "chromosome=X")
# 定义结果向量
geneIDs <- character(length(data))
# 提取 GeneID 数据
for (i in seq_along(data)) {
matches <- regmatches(data[i], regexpr("Dbxref=GeneID:[^\\s]+", data[i], ignore.case = TRUE))
if (length(matches) > 0) {
geneIDs[i] <- gsub("Dbxref=GeneID:", "", matches)
}
}
# 输出结果
print(geneIDs)
2.lapply()函数遍历
# 创建示例数据
data <- c("ID=CD266144.1:1", "Name=CD266144.1", "Dbxref=GeneID:101095186", "gbkey=Src", "chromosome=X",
"ID=CD266144.1:2", "Name=CD266144.1", "Dbxref=GeneID:101095187", "gbkey=Src", "chromosome=X")
# 提取 GeneID 数据
geneIDs <- unlist(lapply(data, function(x) {
matches <- regmatches(x, regexpr("Dbxref=GeneID:[^\\s]+", x, ignore.case = TRUE))
if (length(matches) > 0) {
gsub("Dbxref=GeneID:", "", matches)
} else {
""
}
}))
# 输出结果
print(geneIDs)
希望可以帮到你~~~