shinezhu1995 2022-04-19 08:29 采纳率: 80%
浏览 254
已结题

怎么用python修改fasta序列的ID名字

WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ
WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ

想请教怎么用python写一个脚本实现一次性修改上千条上面这个格式序列的名字,将其修改为WP+数字(1到最后一条序列的序号),比如第一条改为>WP1, 第二条为WP2. 感恩。

  • 写回答

1条回答 默认 最新

  • 䴺矏 2022-04-19 11:31
    关注
    import re
    def run():
        fasta_str = """WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            WP_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            """
    
        f_index = re.finditer('WP_[0-9]+', fasta_str)
        f_index.__next__()
        start_index = 0
        fasta_list = []
        number = 0
        replace = lambda x: re.sub("WP_", f"WP_{x[0] + 1}_", fasta_str[start_index: x[1]], count=1)
        for number, i in enumerate(f_index):
            fasta_index = i.span()[0]
            fasta_list.append(replace([number, fasta_index]))
            start_index = fasta_index
        fasta_list.append(replace([number + 1, len(fasta_str)]))
        for i in fasta_list:
            print(i)
    
    # 结果
    '''
    WP_1_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            
    WP_2_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
    
    '''
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录

报告相同问题?

问题事件

  • 系统已结题 4月27日
  • 已采纳回答 4月19日
  • 创建了问题 4月19日

悬赏问题

  • ¥30 代码本地运行正常,但是TOMCAT部署时闪退
  • ¥15 关于#python#的问题
  • ¥15 主机可以ping通路由器但是连不上网怎么办
  • ¥15 数据库一张以时间排好序的表中,找出多次相邻的那些行
  • ¥50 关于DynamoRIO处理多线程程序时候的问题
  • ¥15 kubeadm部署k8s出错
  • ¥15 Abaqus打不开cae文件怎么办?
  • ¥20 双系统开机引导中windows系统消失问题?
  • ¥15 小程序准备上线,软件开发公司需要提供哪些资料给甲方
  • ¥15 关于生产日期批次退货退款,库存回退的问题