【以下回答由 GPT 生成】
问题1:查看两个table表之间基因共有ID,各自特有ID。如果有共有基因,分别输出各自的共有结果以及合并的共有结果。
解决方案:
use strict;
use warnings;
# 读取第一个表格的数据
my %table1;
open my $fh1, '<', '/mnt/sdc/shguest/hui/jiao1/program/table/1.DEG.xls' or die "Cannot open file: 1.DEG.xls";
while (my $line = <$fh1>) {
chomp $line;
my @fields = split /\t/, $line;
my $gene_id = $fields[0];
$table1{$gene_id} = 1;
}
close $fh1;
# 读取第二个表格的数据
my %table2;
open my $fh2, '<', '/mnt/sdc/shguest/hui/jiao1/program/table/2.DEG.xls' or die "Cannot open file: 2.DEG.xls";
while (my $line = <$fh2>) {
chomp $line;
my @fields = split /\t/, $line;
my $gene_id = $fields[0];
$table2{$gene_id} = 1;
}
close $fh2;
# 输出各自的共有结果
print "First table unique genes:";
foreach my $gene_id (sort keys %table1) {
if (exists $table2{$gene_id}) {
print "$gene_id ";
}
}
print "\n";
print "Second table unique genes:";
foreach my $gene_id (sort keys %table2) {
if (!exists $table1{$gene_id}) {
print "$gene_id ";
}
}
print "\n";
# 输出合并的共有结果
print "Combined unique genes:";
foreach my $gene_id (sort keys %table1, %table2) {
print "$gene_id ";
}
print "\n";
问题2:查找pep.faa文件和1.DEG.xls第一列共有的编号,分别输出共有和特有的结果。
解决方案:
use strict;
use warnings;
# 读取pep.faa文件的数据
my %pep;
open my $fh_pep, '<', '/mnt/sdc/shguest/hui/jiao1/program/table/pep.faa' or die "Cannot open file: pep.faa";
while (my $line = <$fh_pep>) {
chomp $line;
if ($line =~ /^>(\w+)/) {
my $gene_id = $1;
$pep{$gene_id} = 1;
}
}
close $fh_pep;
# 读取1.DEG.xls的第一列数据
my %deg;
open my $fh_deg, '<', '/mnt/sdc/shguest/hui/jiao1/program/table/1.DEG.xls' or die "Cannot open file: 1.DEG.xls";
while (my $line = <$fh_deg>) {
chomp $line;
my @fields = split /\t/, $line;
my $gene_id = $fields[0];
$deg{$gene_id} = 1;
}
close $fh_deg;
# 输出共有的结果
print "Common genes between pep.faa and 1.DEG.xls:";
foreach my $gene_id (sort keys %pep) {
if (exists $deg{$gene_id}) {
print "$gene_id ";
}
}
print "\n";
# 输出pep.faa特有的结果
print "Unique genes in pep.faa:";
foreach my $gene_id (sort keys %pep) {
if (!exists $deg{$gene_id}) {
print "$gene_id ";
}
}
print "\n";
# 输出1.DEG.xls特有的结果
print "Unique genes in 1.DEG.xls:";
foreach my $gene_id (sort keys %deg) {
if (!exists $pep{$gene_id}) {
print "$gene_id ";
}
}
print "\n";
问题3:将1.DEG.xls第一列基因编号加上对应的功能描述,如果功能未知则用“-”代替。基因功能文件:all.gene.func.txt。
解决方案:
use strict;
use warnings;
my %gene_func;
open my $fh_gene_func, '<', '/mnt/sdc/shguest/hui/jiao1/program/table/all.gene.func.txt' or die "Cannot open file: all.gene.func.txt";
while (my $line = <$fh_gene_func>) {
chomp $line;
my ($gene_id, $func) = split /\t/, $line;
$gene_func{$gene_id} = $func;
}
close $fh_gene_func;
open my $fh_deg, '<', '/mnt/sdc/shguest/hui/jiao1/program/table/1.DEG.xls' or die "Cannot open file: 1.DEG.xls";
while (my $line = <$fh_deg>) {
chomp $line;
my @fields = split /\t/, $line;
my $gene_id = $fields[0];
my $func = exists $gene_func{$gene_id} ? $gene_func{$gene_id} : '-';
print "$gene_id\t$func\n";
}
close $fh_deg;
问题4:找出snp.anno.xls文件中targetScanS、esp6500si_all、GnomAD_EAS_AF和CADD这些表头分别位于第几列,并进行输出。
解决方案:
use strict;
use warnings;
open my $fh_snp_anno, '<', '/mnt/sdc/shguest/hui/jiao1/program/table/snp.anno.xls' or die "Cannot open file: snp.anno.xls";
my $header = <$fh_snp_anno>;
chomp $header;
my @header_fields = split /\t/, $header;
my %header_col;
for (my $i = 0; $i < scalar @header_fields; $i++) {
$header_col{$header_fields[$i]} = $i+1;
}
# 输出targetScanS的列号
my $targetScanS_col = $header_col{'targetScanS'};
if ($targetScanS_col) {
print "Column number for targetScanS: $targetScanS_col\n";
} else {
print "Column number for targetScanS not found\n";
}
# 输出esp6500si_all的列号
my $esp6500si_all_col = $header_col{'esp6500si_all'};
if ($esp6500si_all_col) {
print "Column number for esp6500si_all: $esp6500si_all_col\n";
} else {
print "Column number for esp6500si_all not found\n";
}
# 输出GnomAD_EAS_AF的列号
my $GnomAD_EAS_AF_col = $header_col{'GnomAD_EAS_AF'};
if ($GnomAD_EAS_AF_col) {
print "Column number for GnomAD_EAS_AF: $GnomAD_EAS_AF_col\n";
} else {
print "Column number for GnomAD_EAS_AF not found\n";
}
# 输出CADD的列号
my $CADD_col = $header_col{'CADD'};
if ($CADD_col) {
print "Column number for CADD: $CADD_col\n";
} else {
print "Column number for CADD not found\n";
}
close $fh_snp_anno;
问题5:删除snp.anno.xls文件中的Ori_REF、Ori_ALT、shared_hom、shared_het和OMIM这些列。
解决方案:
use strict;
use warnings;
open my $fh_snp_anno, '<', '/mnt/sdc/shguest/hui/jiao1/program/table/snp.anno.xls' or die "Cannot open file: snp.anno.xls";
my $header = <$fh_snp_anno>;
chomp $header;
my @header_fields = split /\t/, $header;
my @keep_fields;
foreach my $field (@header_fields) {
if ($field eq 'Ori_REF' || $field eq 'Ori_ALT' || $field eq 'shared_hom' || $field eq 'shared_het' || $field eq 'OMIM') {
next;
}
push @keep_fields, $field;
}
my $new_header = join "\t", @keep_fields;
print "$new_header\n";
while (my $line = <$fh_snp_anno>) {
chomp $line;
my @fields = split /\t/, $line;
my @keep_values;
foreach my $i (0..$#header_fields) {
if ($header_fields[$i] eq 'Ori_REF' || $header_fields[$i] eq 'Ori_ALT' || $header_fields[$i] eq 'shared_hom' || $header_fields[$i] eq 'shared_het' || $header_fields[$i] eq 'OMIM') {
next;
}
push @keep_values, $fields[$i];
}
my $new_line = join "\t", @keep_values;
print "$new_line\n";
}
close $fh_snp_anno;
以上便是对于问题的具体解决方案,如果还有其他问题,请随时提问。
如果你已经解决了该问题, 非常希望你能够分享一下解决方案, 写成博客, 将相关链接放在评论区, 以帮助更多的人 ^-^