shinezhu1995 2022-04-19 08:29 采纳率: 80%
浏览 251
已结题

怎么用python修改fasta序列的ID名字

WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ
WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
DGFGAVFEVLRNPTAQ

想请教怎么用python写一个脚本实现一次性修改上千条上面这个格式序列的名字,将其修改为WP+数字(1到最后一条序列的序号),比如第一条改为>WP1, 第二条为WP2. 感恩。

  • 写回答

1条回答 默认 最新

  • 䴺矏 2022-04-19 11:31
    关注
    import re
    def run():
        fasta_str = """WP_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            WP_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            """
    
        f_index = re.finditer('WP_[0-9]+', fasta_str)
        f_index.__next__()
        start_index = 0
        fasta_list = []
        number = 0
        replace = lambda x: re.sub("WP_", f"WP_{x[0] + 1}_", fasta_str[start_index: x[1]], count=1)
        for number, i in enumerate(f_index):
            fasta_index = i.span()[0]
            fasta_list.append(replace([number, fasta_index]))
            start_index = fasta_index
        fasta_list.append(replace([number + 1, len(fasta_str)]))
        for i in fasta_list:
            print(i)
    
    # 结果
    '''
    WP_1_018731760.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
            
    WP_2_018731761.1 NAD(P)-binding domain-containing protein [Salinispora pacifica]
            MAADNREPVTVIGLGGMGSALARAFLARGHPTTVWNRTPNKADDLVSQGALRATTVADAMSAGKLIVVCV
            LDYRAMREIINSTDDTAADRVIVNLTSGTPADARATAAWAGERGMSYLDGAIMAIPPMIGSEEALIFYGG
            PQEVYETHAETLRSVAGSGTYLGPDAGLPSLYDVALLGLMWTTWTGFMQATALLASEEVPAANFLPYAQA
            WFEHVISPEMPTLAGQVDTGAYPDHESTLGMQAVAIAHLVHASRTQGVDAALAEFLHARAEQAIRRGHAD
            DGFGAVFEVLRNPTAQ
    
    '''
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论 编辑记录

报告相同问题?

问题事件

  • 系统已结题 4月27日
  • 已采纳回答 4月19日
  • 创建了问题 4月19日

悬赏问题

  • ¥15 csmar数据进行spss描述性统计分析
  • ¥15 各位请问平行检验趋势图这样要怎么调整?说标准差差异太大了
  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题
  • ¥15 wpf界面一直接收PLC给过来的信号,导致UI界面操作起来会卡顿
  • ¥15 init i2c:2 freq:100000[MAIXPY]: find ov2640[MAIXPY]: find ov sensor是main文件哪里有问题吗
  • ¥15 运动想象脑电信号数据集.vhdr
  • ¥15 三因素重复测量数据R语句编写,不存在交互作用
  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?