linux去掉特定的换行符、选出同名最长序列 5C

输入文件:a.txt

CP007439.1 Serratia plymuthica strain V4 genome
ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGAC
CGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATC
ATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAA
GCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC
CP007439.1 Serratia plymuthica strain V4 genome
CGCCCGGTACAGCACCCTTAACCAGCAGCAGGTTGCGC
CGCTCGTCACCCAGGTGGCCAGCCATTTTCTTGCCTTT
AACGCGGTG
CP007439.1 Serratia plymuthica strain V4 genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT
CP002775.1 Serratia sp. AS13, complete genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT

CP007439.1 Serratia plymuthica strain V4 genome

ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGAC (该处换行符去除)
CGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATC (该处换行符去除)
ATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAA (该处换行符去除)
GCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC (换行符留下)
CP007439.1 Serratia plymuthica strain V4 genome

CGCCCGGTACAGCACCCTTAACCAGCAGCAGGTTGCGC (该处换行符去除)
CGCTCGTCACCCAGGTGGCCAGCCATTTTCTTGCCTTT (该处换行符去除)
AACGCGGTG (换行符留下)
CP007439.1 Serratia plymuthica strain V4 genome

GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT (换行符留下)
CP002775.1 Serratia sp. AS13, complete genome

GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT (换行符留下)

要求:
1、上面4段序列中,有很多多余的换行符,需要去掉
2、去掉序列名相同的序列中较短的序列,序列名都以“>”开头,只留下最长的那一段

输出文件形式

CP007439.1 Serratia plymuthica strain V4 genome
ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGACCGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATCATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAAGCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC
CP002775.1 Serratia sp. AS13, complete genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT

0

4个回答

 f = open('input.txt', 'r')
fo = open('output.txt', 'w')
for line in f:
    line = line.rstrip('\n')
    tl = len(line)
    if line.startswith('CP'):
        fo.write('\n')
        fo.write(line)
        fo.write('\n')
        continue
    fo.write(line)
    if tl == 0:
        continue
    if tl < 38:
        fo.write('\n')
f.close()
fo.close()

0
weixin_42056387
weixin_42056387 我是小白,Python不会运行。。。python delete.python [ 4:13下午] File "delete.python", line 1 f = open('Atopobium_vaginae.txt', 'r') ^ IndentationError: unexpected indent
大约一年之前 回复

Python

f = open('input.txt', 'r')
fo = open('output.txt', 'w')
for line in f:
line = line.rstrip('\n')
tl = len(line)
if line.startswith('CP'):
fo.write('\n')
fo.write(line)
fo.write('\n')
continue
fo.write(line)
if tl == 0:
continue
if tl < 38:
fo.write('\n')
f.close()
fo.close()

0

空格数要完全对齐,不能混用tab和空格

0

shell编程,匹配除去是我换行符,然后进行比较

0
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!