shinezhu1995 2022-04-19 08:29 采纳率: 80%
浏览 257
已结题

怎么用python修改fasta序列的ID名字

WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ
WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ

想请教怎么用python写一个脚本实现一次性修改上千条上面这个格式序列的名字,将其修改为WP+数字(1到最后一条序列的序号),比如第一条改为>WP1, 第二条为WP2. 感恩。

  • 写回答

1条回答 默认 最新

  • 䴺矏 2022-04-19 11:31
    关注
    import re
    def run():
        fasta_str = """WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            WP_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            """
    
        f_index = re.finditer('WP_[0-9]+', fasta_str)
        f_index.__next__()
        start_index = 0
        fasta_list = []
        number = 0
        replace = lambda x: re.sub("WP_", f"WP_{x[0] + 1}_", fasta_str[start_index: x[1]], count=1)
        for number, i in enumerate(f_index):
            fasta_index = i.span()[0]
            fasta_list.append(replace([number, fasta_index]))
            start_index = fasta_index
        fasta_list.append(replace([number + 1, len(fasta_str)]))
        for i in fasta_list:
            print(i)
    
    # 结果
    '''
    WP_1_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            
    WP_2_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
    
    '''
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录

报告相同问题?

问题事件

  • 系统已结题 4月27日
  • 已采纳回答 4月19日
  • 创建了问题 4月19日

悬赏问题

  • ¥15 fastreport怎么判断当前页数
  • ¥15 Kylin-Desktop-V10-GFB-Release-JICAI_02- 2207-Build14-ARM64.iso有没有这个版本的系统啊
  • ¥15 能不能通过蓝牙将传感器数据传送到手机上
  • ¥20 100元python和数据科学实验项目
  • ¥15 根据时间在调用出列表
  • ¥15 R 包chipseeker 安装失败
  • ¥15 Veeam Backup & Replication 9.5 还原问题
  • ¥15 vue-print-nb
  • ¥15 winfrom的datagridview下拉框变成了黑色,渲染不成功
  • ¥20 利用ntfy实现短信推送