weixin_42056387
weixin_42056387
采纳率0%
2018-04-24 05:52 阅读 759

linux去掉特定的换行符、选出同名最长序列

5

输入文件:a.txt

CP007439.1 Serratia plymuthica strain V4 genome
ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGAC
CGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATC
ATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAA
GCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC
CP007439.1 Serratia plymuthica strain V4 genome
CGCCCGGTACAGCACCCTTAACCAGCAGCAGGTTGCGC
CGCTCGTCACCCAGGTGGCCAGCCATTTTCTTGCCTTT
AACGCGGTG
CP007439.1 Serratia plymuthica strain V4 genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT
CP002775.1 Serratia sp. AS13, complete genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT

CP007439.1 Serratia plymuthica strain V4 genome

ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGAC (该处换行符去除)
CGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATC (该处换行符去除)
ATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAA (该处换行符去除)
GCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC (换行符留下)
CP007439.1 Serratia plymuthica strain V4 genome

CGCCCGGTACAGCACCCTTAACCAGCAGCAGGTTGCGC (该处换行符去除)
CGCTCGTCACCCAGGTGGCCAGCCATTTTCTTGCCTTT (该处换行符去除)
AACGCGGTG (换行符留下)
CP007439.1 Serratia plymuthica strain V4 genome

GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT (换行符留下)
CP002775.1 Serratia sp. AS13, complete genome

GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT (换行符留下)

要求:
1、上面4段序列中,有很多多余的换行符,需要去掉
2、去掉序列名相同的序列中较短的序列,序列名都以“>”开头,只留下最长的那一段

输出文件形式

CP007439.1 Serratia plymuthica strain V4 genome
ATGGTCAGCACGATCCTTGGCCGCAAGCTTGGGATGACCGCACGGCTCCCGCAACCAGCGTCGCCCGGGTTCCATCATGTACGGTCACATGGGCGACGAGCGCGTGACGGTCAAGCTCGTCAAGGGCGCTGTCCCCGGCGGCAAGAACGCTC
CP002775.1 Serratia sp. AS13, complete genome
GAAGTTCCAGCGCTTTACAGTGCCGGCAAAACCTTT

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

4条回答 默认 最新

  • lingxf 凌空跃 2018-04-24 07:23

    Python

    f = open('input.txt', 'r')
    fo = open('output.txt', 'w')
    for line in f:
    line = line.rstrip('\n')
    tl = len(line)
    if line.startswith('CP'):
    fo.write('\n')
    fo.write(line)
    fo.write('\n')
    continue
    fo.write(line)
    if tl == 0:
    continue
    if tl < 38:
    fo.write('\n')
    f.close()
    fo.close()

    点赞 评论 复制链接分享
  • lingxf 凌空跃 2018-04-24 07:24
     f = open('input.txt', 'r')
    fo = open('output.txt', 'w')
    for line in f:
        line = line.rstrip('\n')
        tl = len(line)
        if line.startswith('CP'):
            fo.write('\n')
            fo.write(line)
            fo.write('\n')
            continue
        fo.write(line)
        if tl == 0:
            continue
        if tl < 38:
            fo.write('\n')
    f.close()
    fo.close()
    
    
    点赞 评论 复制链接分享
  • lingxf 凌空跃 2018-04-24 10:28

    空格数要完全对齐,不能混用tab和空格

    点赞 评论 复制链接分享
  • longyung longyung 2018-04-25 00:29

    shell编程,匹配除去是我换行符,然后进行比较

    点赞 评论 复制链接分享

相关推荐