输入文件:a.txt
CP007439.1 Serratia plymuthica strain V4 genome
ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGAC
CGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATC
ATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAA
GCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC
CP007439.1 Serratia plymuthica strain V4 genome
CGCCCGGTACAGCACCCTTAACCAGCAGCAGGTTGCGC
CGCTCGTCACCCAGGTGGCCAGCCATTTTCTTGCCTTT
AACGCGGTG
CP007439.1 Serratia plymuthica strain V4 genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT
CP002775.1 Serratia sp. AS13, complete genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT
CP007439.1 Serratia plymuthica strain V4 genome
ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGAC (该处换行符去除)
CGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATC (该处换行符去除)
ATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAA (该处换行符去除)
GCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC (换行符留下)
CP007439.1 Serratia plymuthica strain V4 genome
CGCCCGGTACAGCACCCTTAACCAGCAGCAGGTTGCGC (该处换行符去除)
CGCTCGTCACCCAGGTGGCCAGCCATTTTCTTGCCTTT (该处换行符去除)
AACGCGGTG (换行符留下)
CP007439.1 Serratia plymuthica strain V4 genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT (换行符留下)
CP002775.1 Serratia sp. AS13, complete genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT (换行符留下)
要求:
1、上面4段序列中,有很多多余的换行符,需要去掉
2、去掉序列名相同的序列中较短的序列,序列名都以“>”开头,只留下最长的那一段
输出文件形式
CP007439.1 Serratia plymuthica strain V4 genome
ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGACCGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATCATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAAGCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC
CP002775.1 Serratia sp. AS13, complete genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT