shinezhu1995 2022-04-19 08:29 采纳率: 80%
浏览 257
已结题

怎么用python修改fasta序列的ID名字

WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ
WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ

想请教怎么用python写一个脚本实现一次性修改上千条上面这个格式序列的名字,将其修改为WP+数字(1到最后一条序列的序号),比如第一条改为>WP1, 第二条为WP2. 感恩。

  • 写回答

1条回答 默认 最新

  • 䴺矏 2022-04-19 11:31
    关注
    import re
    def run():
        fasta_str = """WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            WP_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            """
    
        f_index = re.finditer('WP_[0-9]+', fasta_str)
        f_index.__next__()
        start_index = 0
        fasta_list = []
        number = 0
        replace = lambda x: re.sub("WP_", f"WP_{x[0] + 1}_", fasta_str[start_index: x[1]], count=1)
        for number, i in enumerate(f_index):
            fasta_index = i.span()[0]
            fasta_list.append(replace([number, fasta_index]))
            start_index = fasta_index
        fasta_list.append(replace([number + 1, len(fasta_str)]))
        for i in fasta_list:
            print(i)
    
    # 结果
    '''
    WP_1_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            
    WP_2_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
    
    '''
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录

报告相同问题?

问题事件

  • 系统已结题 4月27日
  • 已采纳回答 4月19日
  • 创建了问题 4月19日

悬赏问题

  • ¥20 电脑拓展屏桌面被莫名遮挡
  • ¥20 ensp,用局域网解决
  • ¥15 Python语言实验
  • ¥15 我每周要在投影仪优酷上自动连续播放112场电影,我每一周遥控操作一次投影仪,并使得电影永远不重复播放,请问怎样操作好呢?有那么多电影看吗?
  • ¥20 电脑重启停留在grub界面,引导出错需修复
  • ¥15 matlab透明图叠加
  • ¥50 基于stm32l4系列 使用blunrg-ms的ble gatt 创建 hid 服务失败
  • ¥150 计算DC/DC变换器平均模型中的参数mu
  • ¥25 C语言代码,大家帮帮我
  • ¥50 关于#html5#的问题:H5页面用户手机返回的时候跳转到指定页面例如(语言-javascript)