shinezhu1995 2022-04-19 08:29 采纳率: 80%
浏览 251
已结题

怎么用python修改fasta序列的ID名字

WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ
WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ

想请教怎么用python写一个脚本实现一次性修改上千条上面这个格式序列的名字,将其修改为WP+数字(1到最后一条序列的序号),比如第一条改为>WP1, 第二条为WP2. 感恩。

  • 写回答

1条回答 默认 最新

  • 䴺矏 2022-04-19 11:31
    关注
    import re
    def run():
        fasta_str = """WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            WP_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            """
    
        f_index = re.finditer('WP_[0-9]+', fasta_str)
        f_index.__next__()
        start_index = 0
        fasta_list = []
        number = 0
        replace = lambda x: re.sub("WP_", f"WP_{x[0] + 1}_", fasta_str[start_index: x[1]], count=1)
        for number, i in enumerate(f_index):
            fasta_index = i.span()[0]
            fasta_list.append(replace([number, fasta_index]))
            start_index = fasta_index
        fasta_list.append(replace([number + 1, len(fasta_str)]))
        for i in fasta_list:
            print(i)
    
    # 结果
    '''
    WP_1_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            
    WP_2_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
    
    '''
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录

报告相同问题?

问题事件

  • 系统已结题 4月27日
  • 已采纳回答 4月19日
  • 创建了问题 4月19日

悬赏问题

  • ¥20 我要一个分身加定位两个功能的安卓app
  • ¥15 基于FOC驱动器,如何实现卡丁车下坡无阻力的遛坡的效果
  • ¥15 IAR程序莫名变量多重定义
  • ¥15 (标签-UDP|关键词-client)
  • ¥15 关于库卡officelite无法与虚拟机通讯的问题
  • ¥15 目标检测项目无法读取视频
  • ¥15 GEO datasets中基因芯片数据仅仅提供了normalized signal如何进行差异分析
  • ¥100 求采集电商背景音乐的方法
  • ¥15 数学建模竞赛求指导帮助
  • ¥15 STM32控制MAX7219问题求解答