本人刚接触python 两周不到,但着急处理一个问题。
想请教给位大神:我现在有一批文本文档(具体不少于2万),文档内容是表示每个物种中的不同蛋白序列,如下所示:
YP_009440948.1 NADH dehydrogenase subunit 6 (mitochondrion) [Absidia glauca]
MNAILLDLLAFGSVLSGILVITSRNPIISVLFLIAVFVNVACYLILLGINFIGLTYLIIYVGAIAILFLFVVMMLNIKLVELQDSAENYSNPYPLAFVLGTLFVSGLGLSNSNISKIDLPSIFDSINLFSFKSNKLETLFVSHSNWDNVFVSLDQINSVGQVLYTSHALFLVIASMILLLAMVGPIVLCLKPTKRLS
YP_009440949.1 GIY-YIG endonuclease (mitochondrion) [Absidia glauca]
MKNNSFVQTVLTDNGWTQEESLVSIHPLSSNDTQYHSFTFKSTPVKVYHNCEINAQLILDEIRDKFGIYLWLNTVNGIMYVGSAKDLSKRLINYWTPFKSVSQCIIEMNINRNIIYK
YP_009440950.1 NADH dehydrogenase subunit 1 (mitochondrion) [Absidia glauca]
MLLSLIEVLIVIVPLLLSVAFMTIAERKAMGSMQRRLGPNRVGYYGLLQPVADALKLFVKESVLPAHSNKALFLLAPVISLIVSLVSWGVMPFGSGLTLADLSLGMLYLLAVSSLGVYGVIFAGWAANSKYAFLGSLRSTAQMVSYEVVMGLIILTVVLLVGSLNLTEIIQSQISIWYIIPLLPLSLMFLISAIAETNRAPFDLPEAESELVAGFFTEHSSVPFVMFFLGEYASIILMSSLVSILFLGGYLVPFVSFENPTFVSFEGLSLGLKTSLILFIYIWVRASFPRLRYDQLMSFTWTGMLPLALGFIILVPCILVAFEIA
YP_009440951.1 GIY-YIG endonuclease (mitochondrion) [Absidia glauca]
MLNNKFYYYGSSKDLGTRLKYHYYVTPKDSNKFGLFLKTVGWDYFSVTIVELCDSKDLAERETWYLQKYRPLLNTLFEVGEWPGVKFHSESTKTLISKTLTGKTHSEETKLKMSQSHQGEKNIFFNKSLPKATLDAAALVNSNLVWVYNAETKTLLKESPISSKRQTAKILGISYNSVVKYLDTDKSFKGFLMYSKEKAPV
YP_009440952.1 ATP synthase F0 subunit 8 (mitochondrion) [Absidia glauca]
MPQLVPFYFLNQVSFAFLLLMVLLYVVSKYILPNILLVQSARMFLASK
我现在想计算每个文本文档中的两个氨基酸如(LL)在整个物种中出现的总次数(PS:每个肽键记为一次重复,如--LLLL--这个多肽序列,应该记为3个),想请问一下,我这程序应该怎样写呢?
谢谢各位大神!