笑*星*星* 2019-07-21 12:45 采纳率: 0%
浏览 316

python方法计算文本中二肽在蛋白序列中出现的次数

本人刚接触python 两周不到,但着急处理一个问题。
想请教给位大神:我现在有一批文本文档(具体不少于2万),文档内容是表示每个物种中的不同蛋白序列,如下所示:

YP_009440948.1 NADH dehydrogenase subunit 6 (mitochondrion) [Absidia glauca]
MNAILLDLLAFGSVLSGILVITSRNPIISVLFLIAVFVNVACYLILLGINFIGLTYLIIYVGAIAILFLFVVMMLNIKLVELQDSAENYSNPYPLAFVLGTLFVSGLGLSNSNISKIDLPSIFDSINLFSFKSNKLETLFVSHSNWDNVFVSLDQINSVGQVLYTSHALFLVIASMILLLAMVGPIVLCLKPTKRLS
YP_009440949.1 GIY-YIG endonuclease (mitochondrion) [Absidia glauca]
MKNNSFVQTVLTDNGWTQEESLVSIHPLSSNDTQYHSFTFKSTPVKVYHNCEINAQLILDEIRDKFGIYLWLNTVNGIMYVGSAKDLSKRLINYWTPFKSVSQCIIEMNINRNIIYK
YP_009440950.1 NADH dehydrogenase subunit 1 (mitochondrion) [Absidia glauca]
MLLSLIEVLIVIVPLLLSVAFMTIAERKAMGSMQRRLGPNRVGYYGLLQPVADALKLFVKESVLPAHSNKALFLLAPVISLIVSLVSWGVMPFGSGLTLADLSLGMLYLLAVSSLGVYGVIFAGWAANSKYAFLGSLRSTAQMVSYEVVMGLIILTVVLLVGSLNLTEIIQSQISIWYIIPLLPLSLMFLISAIAETNRAPFDLPEAESELVAGFFTEHSSVPFVMFFLGEYASIILMSSLVSILFLGGYLVPFVSFENPTFVSFEGLSLGLKTSLILFIYIWVRASFPRLRYDQLMSFTWTGMLPLALGFIILVPCILVAFEIA
YP_009440951.1 GIY-YIG endonuclease (mitochondrion) [Absidia glauca]
MLNNKFYYYGSSKDLGTRLKYHYYVTPKDSNKFGLFLKTVGWDYFSVTIVELCDSKDLAERETWYLQKYRPLLNTLFEVGEWPGVKFHSESTKTLISKTLTGKTHSEETKLKMSQSHQGEKNIFFNKSLPKATLDAAALVNSNLVWVYNAETKTLLKESPISSKRQTAKILGISYNSVVKYLDTDKSFKGFLMYSKEKAPV
YP_009440952.1 ATP synthase F0 subunit 8 (mitochondrion) [Absidia glauca]
MPQLVPFYFLNQVSFAFLLLMVLLYVVSKYILPNILLVQSARMFLASK

我现在想计算每个文本文档中的两个氨基酸如(LL)在整个物种中出现的总次数(PS:每个肽键记为一次重复,如--LLLL--这个多肽序列,应该记为3个),想请问一下,我这程序应该怎样写呢?

谢谢各位大神!

  • 写回答

2条回答 默认 最新

  • dabocaiqq 2019-07-21 14:36
    关注

    遍历 判定

    评论

报告相同问题?

悬赏问题

  • ¥20 双层网络上信息-疾病传播
  • ¥50 paddlepaddle pinn
  • ¥20 idea运行测试代码报错问题
  • ¥15 网络监控:网络故障告警通知
  • ¥15 django项目运行报编码错误
  • ¥15 请问这个是什么意思?
  • ¥15 STM32驱动继电器
  • ¥15 Windows server update services
  • ¥15 关于#c语言#的问题:我现在在做一个墨水屏设计,2.9英寸的小屏怎么换4.2英寸大屏
  • ¥15 模糊pid与pid仿真结果几乎一样