计算DNA基因序列的长度,怎么用C语言的程序设计的思想方式和技术来计算的呢?

Problem Description
sevenzero is very interesting in Bioinformation and have done some research on it. One day, sevenzero found a phenomenon called Microgene. Microgene is a special fragment in the DNA, and different Microgenes may have the same hereditary effect. Microgene works if and only if there are more than one Microgenes(Microgenes may overlap) with the same hereditary effect in the DNA. To finish his paper, sevenzero wants to know how many different DNAs with length L which contain the hereditary effect caused by Microgenes.

To simplify the problem, a DNA or a Microgene is considerd as a string consisting of character 'A', 'T', 'C' and 'G'. And a Microgene is in the DNA if the Microgene string is the substring of the DNA string. All Microgenes given are different and with the same hereditary effect.

Input
There are several test cases in the input. Each case begins with a line with an integer N (1 ≤ N ≤ 6) and L (1 ≤ N ≤ 1000000), denoting the number of Microgenes and the length of DNA. The following N lines contain N strings representing the Microgenes.The length of the Microgene is no more than 5. The input is terminated by EOF.

Output
One line for each case, the answer modulo 10007.

Sample Input
2 3
AT
TC
2 3
ATC
T
3 1000000
ATCG
TCGT
CTAG

Sample Output
1
11
5063

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
其他相关推荐
新人c++编程问题求助 相似基因序列问题

Description 众所周知,人类基因可以看作一个碱基对序列,它包含了4种核苷酸,简记为A,C,G,T。让我们观察这样一段基因序列 “ACCAGGTT”,这段序列共由8个核苷酸构成,其中第1位和第4位是碱基“A”,第2位和第3位是碱基“C”,第5位和第6位是碱基“G”,第7位和第8位是碱基“T”。Tom构造了这样一个0,1矩阵: 1, 0, 0, 1, 0, 0, 0, 0 0, 1, 1, 0, 0, 0, 0, 0 0, 1, 1, 0, 0, 0, 0, 0 1, 0, 0, 1, 0, 0, 0, 0 0, 0, 0, 0, 1, 1, 0, 0 0, 0, 0, 0, 1, 1, 0, 0 0, 0, 0, 0, 0, 0, 1, 1 0, 0, 0, 0, 0, 0, 1, 1 如果第i位的碱基与第j位的碱基一样,那么0,1矩阵的i行j列为1,否则为0。如果基因序列X与基因序列Y等长且具有相同的0,1矩阵,Tom就会认为X与Y是相似的基因序列。 现在的问题是:给你两段长度为N的基因序列,请你帮助Tom判断它们是否相似。 Input 可以有多组测试数据,每组数据第1行输入一个正整数N(1≤N≤1000000),第2行和第3行分别输入两段长度为N的基因序列(只由A,C,G,T四种字符构成)。输入直至N=0为结尾。 Output 每组数据输出仅一行,如果相似则输出 YES,否则输出 NO。 Sample Input 2 AA TG 6 ACCGTT GAATCC 0 Sample Output NO YES

DNA遗传算法的一个难题的解决的思路,怎么运用C语言解决这个题的办法

Problem Description Every kind of living creatures has a kind of DNA. The nucleotide bases from which DNA is built are A (adenine), C (cytosine), G (guanine), and T (thymine). Sometimes if two DNA of two living creatures have the same substring, and the length is beyond a certain percentage of the whole length, we many consider whether the two living creatures have the same ancestor. And we can separate them into a certain species temporarily for our research, and we say the two living creatures are similar Make sure if A is similar with B, and B is similar with C, but C is not similar with A, we also separate A, B and C into a kind, for during the evolution, there happens aberrance. Now we have some kinds of living creatures and their DNA, just tell us how many kinds of living creatures we can separate. Input There are a lot of cases. In each case, in the first line there are two numbers N and P. N means the number of kinds of living creatures. If two DNA are similar, there exist a substring, and its length is beyond the percentage of any DNA of the two, and P is just the percentage. And 1<=N<=100, and 1<=P<100 (P is 100, which means two DNA are similar if and only if they are the same, so we make sure P is smaller than 100). The length of each DNA won't exceed 100. Output For each case, just print how many kinds living creatures we can separate. Sample Input 3 10.0 AAA AA CCC Sample Output Case 1: 2

DNA上ACGT四种蛋白质序列的程序表达和计算,怎么利用C语言的程序编写的方式加以有效实现的

Problem Description Every kind of living creatures has a kind of DNA. The nucleotide bases from which DNA is built are A (adenine), C (cytosine), G (guanine), and T (thymine). Sometimes if two DNA of two living creatures have the same substring, and the length is beyond a certain percentage of the whole length, we many consider whether the two living creatures have the same ancestor. And we can separate them into a certain species temporarily for our research, and we say the two living creatures are similar Make sure if A is similar with B, and B is similar with C, but C is not similar with A, we also separate A, B and C into a kind, for during the evolution, there happens aberrance. Now we have some kinds of living creatures and their DNA, just tell us how many kinds of living creatures we can separate. Input There are a lot of cases. In each case, in the first line there are two numbers N and P. N means the number of kinds of living creatures. If two DNA are similar, there exist a substring, and its length is beyond the percentage of any DNA of the two, and P is just the percentage. And 1<=N<=100, and 1<=P<100 (P is 100, which means two DNA are similar if and only if they are the same, so we make sure P is smaller than 100). The length of each DNA won't exceed 100. Output For each case, just print how many kinds living creatures we can separate. Sample Input 3 10.0 AAA AA CCC Sample Output Case 1: 2

用C语言编程,产生所有可能的长度为10bp的DNA序列

就是说用A G C T 四个字母产生的。怎么做呀我刚学实在不会呀 大家教教我吧非常感谢

c语言,回文序列的判断,runtime error 求大神解答

#include<stdio.h> #include<string.h> int main(){ int i,*pi,l,*pl,mark,*pmark; pi=&i; pl=&l; pmark=&mark; char str[100000],*pstr; pstr=str; while((scanf("%s",pstr)!=EOF)&&strcmp(str,"2013")!=0){ *pl=strlen(pstr); mark=1; for(*pi=0;*pi<(l/2);*pi++){ if(*(pstr+i)!=*(pstr+(*pl-*pi-1))){ mark=0; break; } } if(mark){ printf("YES\n"); } else{ printf("NO\n"); } } return 0; }

DNA Translation DNA序列的问题

Description Deoxyribonucleic acid (DNA) is composed of a sequence of nucleotide bases paired together to form a double-stranded helix structure. Through a series of complex biochemical processes the nucleotide sequences in an organism's DNA are translated into the proteins it requires for life. The object of this problem is to write a computer program which accepts a DNA strand and reports the protein generated, if any, from the DNA strand. The nucleotide bases from which DNA is built are adenine, cytosine, guanine, and thymine (hereafter referred to as A, C, G, and T, respectively). These bases bond together in a chain to form half of a DNA strand. The other half of the DNA strand is a similar chain, but each nucleotide is replaced by its complementary base. The bases A and T are complementary, as are the bases C and G. These two "half-strands" of DNA are then bonded by the pairing of the complementary bases to form a strand of DNA. Typically a DNA strand is listed by simply writing down the bases which form the primary strand (the complementary strand can always be created by writing the complements of the bases in the primary strand). For example, the sequence TACTCGTAATTCACT represents a DNA strand whose complement would be ATGAGCATTAAGTGA. Note that A is always paired with T, and C is always paired with G. From a primary strand of DNA, a strand of ribonucleic acid (RNA) known as messenger RNA (mRNA for short) is produced in a process known as transcription. The transcribed mRNA is identical to the complementary DNA strand with the exception that thymine is replaced by a nucleotide known as uracil (hereafter referred to as U). For example, the mRNA strand for the DNA in the previous paragraph would be AUGAGCAUUAAGUGA. It is the sequence of bases in the mRNA which determines the protein that will be synthesized. The bases in the mRNA can be viewed as a collection of codons, each codon having exactly three bases. The codon AUG marks the start of a protein sequence, and any of the codons UAA, UAG, or UGA marks the end of the sequence. The one or more codons between the start and termination codons represent the sequence of amino acids to be synthesized to form a protein. For example, the mRNA codon AGC corresponds to the amino acid serine (Ser), AUU corresponds to isoleucine (Ile), and AAG corresponds to lysine (Lys). So, the protein formed from the example mRNA in the previous paragraph is, in its abbreviated form, Ser-Ile-Lys. The complete genetic code from which codons are translated into amino acids is shown in the table below (note that only the amino acid abbreviations are shown). It should also be noted that the sequence AUG, which has already been identified as the start sequence, can also correspond to the amino acid methionine (Met). So, the first AUG in a mRNA strand is the start sequence, but subsequent AUG codons are translated normally into the Met amino acid. First base in codon Second base in codon Third base in codon U C A G U Phe Ser Tyr Cys U Phe Ser Tyr Cys C Leu Ser --- --- A Leu Ser --- Trp G C Leu Pro His Arg U Leu Pro His Arg C Leu Pro Gln Arg A Leu Pro Gln Arg G A Ile Thr Asn Ser U Ile Thr Asn Ser C Ile Thr Lys Arg A Met Thr Lys Arg G G Val Ala Asp Gly U Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu Gly G Input The input for this program consists of strands of DNA sequences, one strand per line, from which the protein it generates, if any, should be determined and output. The given DNA strand may be either the primary or the complementary DNA strand, and it may appear in either forward or reverse order, and the start and termination sequences do not necessarily appear at the ends of the strand. For example, a given input DNA strand to form the protein Ser-Ile-Lys could be any of ATACTCGTAATTCACTCC, CCTCACTTAATGCTCATA, TATGAGCATTAAGTGAGG, or GGAGTGAATTACGAGTAT. The input will be terminated by a line containing a single asterisk character. Output You may assume the input to contain only valid, upper-case, DNA nucleotide base letters (A, C, G, and T). No input line will exceed 255 characters in length. There will be no blank lines or spaces in the input. Some sequences, though valid DNA strands, do not produce valid protein sequences; the string "*** No translatable DNA found ***" should be output when an input DNA strand does not translate into a valid protein. Sample Input ATACTCGTAATTCACTCC CACCTGTACACAGAGGTAACTTAG TTAATACGACATAATTAT GCCTTGATATGGAGAACTCATTAGATA AAGTGTATGTTGAATTATATAAAACGGGCATGA ATGATGATGGCTTGA * Sample Output Ser-Ile-Lys Cys-Leu-His Ser-Tyr *** No translatable DNA found *** Leu-Asn-Tyr-Ile-Lys-Arg-Ala Met-Met-Ala

现在有从UCI上下载下来的一组DNA序列数据,想把他按照某种规则转换成只有0和1的序列,代码应该怎么写?

其中DNA序列是这样的,共有60个碱基 CCAGCTGCATCACAGGAGGCCAGCGAGCAGGTCTGTTCCAAGGGCCTTCGAGCCAGTCTG 然后生成的序列是要判断其中有没有特定序列,有就是1,没有就是0,由于要判断好多种特定序列,所以处理出来的结果会变成0,1组成的序列 我想从新生成的序列,生成一个可以在tensorflow直接用的数据文件 我看网上的tensorflow教程都是下载下来的数据集直接用的,现在这个要预先再处理的要怎么做呢?希望大神们能教教我,至少发个能让我学习怎么做的网站也行吖,谢谢各位了!

为什么只有30分……我日……

描述 为了获知基因序列在功能和结构上的相似性,经常需要将几条不同序列的DNA进行比对,以判断该比对的DNA是否具有相关性。 现比对两条长度相同的DNA序列。首先定义两条DNA序列相同位置的碱基为一个碱基对,如果一个碱基对中的两个碱基相同的话,则称为相同碱基对。接着计算相同碱基对占总碱基对数量的比例,如果该比例大于等于给定阈值时则判定该两条DNA序列是相关的,否则不相关。 输入 有三行,第一行是用来判定出两条DNA序列是否相关的阈值,随后2行是两条DNA序列(长度不大于500)。 输出 若两条DNA序列相关,则输出“yes”,否则输出“no”。 样例输入 0.85 ATCGCCGTAAGTAACGGTTTTAAATAGGCC ATCGCCGGAAGTAACGGTCTTAAATAGGCC 样例输出 yes 我的程序: #include<iostream> #include<cstdio> #include<cstring> using namespace std; int main() { float a,c,d; char b[4][502]; int i; scanf("%f\n",&a); for(i=0;i<2;i++) { gets(b[i]); } int len=strlen(b[0]); for(i=0;i<len;i++) { if(b[0][i]==b[1][i]) { c++; } } d=c*1.00000/len; if(d>=a) { cout<<"yes"; } else { cout<<"no"; } return 0; } 应该没有问题的……

DNA的进化的推算问题,怎么使用C语言的程序的编写的模式的思想去完成程序的实现

Problem Description Every kind of living creatures has a kind of DNA. The nucleotide bases from which DNA is built are A (adenine), C (cytosine), G (guanine), and T (thymine). Sometimes if two DNA of two living creatures have the same substring, and the length is beyond a certain percentage of the whole length, we many consider whether the two living creatures have the same ancestor. And we can separate them into a certain species temporarily for our research, and we say the two living creatures are similar Make sure if A is similar with B, and B is similar with C, but C is not similar with A, we also separate A, B and C into a kind, for during the evolution, there happens aberrance. Now we have some kinds of living creatures and their DNA, just tell us how many kinds of living creatures we can separate. Input There are a lot of cases. In each case, in the first line there are two numbers N and P. N means the number of kinds of living creatures. If two DNA are similar, there exist a substring, and its length is beyond the percentage of any DNA of the two, and P is just the percentage. And 1<=N<=100, and 1<=P<100 (P is 100, which means two DNA are similar if and only if they are the same, so we make sure P is smaller than 100). The length of each DNA won't exceed 100. Output For each case, just print how many kinds living creatures we can separate. Sample Input 3 10.0 AAA AA CCC Sample Output Case 1: 2

字符串分离的代码问题,怎么采用C语言程序设计的方式实现代码编写的过程?

Problem Description Every kind of living creatures has a kind of DNA. The nucleotide bases from which DNA is built are A (adenine), C (cytosine), G (guanine), and T (thymine). Sometimes if two DNA of two living creatures have the same substring, and the length is beyond a certain percentage of the whole length, we many consider whether the two living creatures have the same ancestor. And we can separate them into a certain species temporarily for our research, and we say the two living creatures are similar Make sure if A is similar with B, and B is similar with C, but C is not similar with A, we also separate A, B and C into a kind, for during the evolution, there happens aberrance. Now we have some kinds of living creatures and their DNA, just tell us how many kinds of living creatures we can separate. Input There are a lot of cases. In each case, in the first line there are two numbers N and P. N means the number of kinds of living creatures. If two DNA are similar, there exist a substring, and its length is beyond the percentage of any DNA of the two, and P is just the percentage. And 1<=N<=100, and 1<=P<100 (P is 100, which means two DNA are similar if and only if they are the same, so we make sure P is smaller than 100). The length of each DNA won't exceed 100. Output For each case, just print how many kinds living creatures we can separate. Sample Input 3 10.0 AAA AA CCC Sample Output Case 1: 2

如何用python将指定基因的DNA序列从序列文档中提取出来?

文档1基因名 文档2是所有基因名和对应的序列 文档2中包含有文档1中基因的序列 如何从文档2中将文档1中基因所对应的序列和基因名提取出来? TXT1: accBp aceBp acnBp acrZp adep adiAp agaRp ahpCp2 alaEp aldAp alsRp2 amiAp ansBp2 aptp araBp araCp TXT2: accAp cgcgggcttgctat accBp tagctgttgattat accDp ttttttatccaaag aceBp aaattgtttttgat acnAp2 tgttatcaaatcgt acnBp aaacagattaacac acpPp gggatttagttgca acrAp gttagatttacata acrZp aaaggggagtgctt adep atttcaattgcaca adiAp tcacgcgctttaca agaRp aggtgggcttgcat agaSp ctccattgaacttt ahpCp acgcattagccgaa ahpCp2 aggtgattgccctt alaEp tttttcactaattg aldAp tttcacgattccgt alsRp1 ccagaaaaacaaat alsRp2 aaaaaccagaaaaa amiAp caatatctgacgaa ampCp ctgacagttgtcac ansBp2 aacgtcaaatttcc aptp aatcgcagttgcaa araBp ctacctgacgcttt araCp cgtgattatagaca araEp gacctgacacctgc

、请设计一个用于保存和处理DNA序列的类DNASequence,该类具有以下特征和功能:

3、请设计一个用于保存和处理DNA序列的类DNASequence,该类具有以下特征和功能:  一个翻译表,定义为类属性,名为transcription_table,类型是字典,用于将DNA符号A、T、G、C分别转换为对应的符号,即A到U、T到A、G到C、C到G。  一个限制酶对照表,定义为类属性,名为enz_dict,类型是字典。所谓限制酶,指的是识别特定DNA序列并在识别区内产生截断的蛋白质。本题只关注两种限制酶,一个是’EcoRI’,识别’ GAATTC’序列,'EcoRV',识别'GATATC'序列。  构造函数__init__(self, seqstring)的参数seqstring为一个字符串,代表一个DNA序列,DNASequence有一个对象属性seqstring保存该字符串。要求将参数seqstring中所有字符转换为大写形式再保存到对象属性seqstring中。  有一个对象方法transcription(self),将对象属性seqstring保存的DNA序列,逐符号翻译为对应符号的DNA序列,如”ATG”翻译为”UAC”。  一个对象方法restriction(self, enz),enz为限制酶名称,类型为字符串。该方法的功能是统计对象属性seqstring中,所给限制酶对应的DNA序列的出现次数。如果不含限制酶对应的DNA序列,返回0。  重载len运算,即重新定义特殊函数__len__(self),用于返回对象属性seqstring的长度。  注意:本题不考虑__init__参数seqstring中字符是否合法,默认所给字符都是ATGC之一。 运行示例: >>> virus = DNASequence(’atggagagccttgttcttggtgtcaa’) >>> virus.seqstring ’ATGGAGAGCCTTGTTCTTGGTGTCAA’ >>> virus.transcription() ’ UACCUCUCGGAACAAGAACCACAGUU’ >>> other_virus = DNASequence(’atgatatcggagaggatatcggtgtcaa’) >>> other_virus.restriction(’EcoRV’) 2 >>>len(virus) 26 代码框架:请将该框架拷贝出去,保存在DNASequence.py文件中。如要测试,请在另外的文件里写测试程序。DNASequence.py只能包含DNASequence类的实现代码,不能有其他的测试代码。提交答题时,只提交DNASequence.py。如有其他代码影响了老师改卷,由此造成的扣分后果,由自己承担。 class DNASequence: transcription_table = {} #翻译表 enz_dict = {} #限制酶对照表 def __init__(self, seqstring): #请在下面编写程序 #请勿修改下面的程序 def __len__(self): # 请在下面编写程序 # 请勿修改下面的程序 def restriction(self, enz): # 请在下面编写程序 # 请勿修改下面的程序 def transcription(self): # 请在下面编写程序 # 请勿修改下面的程序

字符串匹配——问题来源DNA序列的k-mer index问题

给定一个DNA序列,这个系列只含有4个字母ATCG,如 S =“CTGTACTGTAT”。给定一个整数值k,从S的第一个位置开始,取一连续k个字母的短串,称之为k-mer(如k= 5,则此短串为CTGTA), 然后从S的第二个位置, 取另一k-mer(如k= 5,则此短串为TGTAC),这样直至S的末端,就得一个集合,包含全部k-mer 。 如对序列S来说,所有5-mer为 {CTGTA,TGTAC,GTACT,TACTG,ACTGT,TGTAT} 通常这些k-mer需一种数据索引方法,可被后面的操作快速访问。例如,对5-mer来说,当查询CTGTA,通过这种数据索引方法,可返回其在DNA序列S中的位置为{1,6}。 问题 现在以文件形式给定 100万个 DNA序列,序列编号为1-1000000,每个基因序列长度为100 。 (1)要求对给定k, 给出并实现一种数据索引方法,可返回任意一个k-mer所在的DNA序列编号和相应序列中出现的位置。每次建立索引,只需支持一个k值即可,不需要支持全部k值。

用C语言的程序来编写一个程序来模拟培养物的生长,读取待模拟的天数,DNA规则以及培养皿的初始种群密度

Description A biologist experimenting with DNA modification of bacteria has found a way to make bacterial colonies sensitive to the surrounding population density. By changing the DNA, he is able to "program"the bacteria to respond to the varying densities in their immediate neighborhood. The culture dish is a square, divided into 400 smaller squares (20x20). Population in each small square is measured on a four point scale (from 0 to 3). The DNA information is represented as an array D, indexed from 0 to 15, of integer values and is interpreted as follows: In any given culture dish square, let K be the sum of that square's density and the densities of the four squares immediately to the left, right, above and below that square (squares outside the dish are considered to have density 0). Then, by the next day, that dish square's density will change by D[K] (which may be a positive, negative, or zero value). The total density cannot, however, exceed 3 nor drop below 0. Now, clearly, some DNA programs cause all the bacteria to die off (e.g., [-3, -3, ..., -3]). Others result in immediate population explosions (e.g., [3,3,3, ..., 3]), and others are just plain boring (e.g., [0, 0,...,0]). The biologist is interested in how some of the less obvious DNA programs might behave. Write a program to simulate the culture growth, reading in the number of days to be simulated, the DNA rules, and the initial population densities of the dish. Input Input to this program consists of three parts: 1. The first line will contain a single integer denoting the number of days to be simulated. 2. The second line will contain the DNA rule D as 16 integer values, ordered from D[0] to D[15], separated from one another by one or more blanks. Each integer will be in the range -3...3, inclusive. 3. The remaining twenty lines of input will describe the initial population density in the culture dish. Each line describes one row of squares in the culture dish, and will contain 20 integers in the range 0?, separated from one another by 1 or more blanks Output The program will produce exactly 20 lines of output, describing the population densities in the culture dish at the end of the simulation. Each line represents a row of squares in the culture dish, and will consist of 20 characters, plus the usual end-of-line terminator. Each character will represent the population density at a single dish square, as follows: No other characters may appear in the output. Sample Input 2 0 1 1 1 2 1 0 -1 -1 -1 -2 -2 -3 -3 -3 -3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Sample Output ##!................. #!.................. !................... .................... .................... .................... .................... .........!.......... ........!#!......... .......!#X#!........ ........!#!......... .........!.......... .................... .................... .................... .................... .................... .................... .................... ....................

python:DNA排序--求逆序数。

【问题描述】 对于给定的序列{ a[1], a[2],... , a[n]},元素a[i] 的逆序数定义为inv(a[i])=|{a[k]|a[i]>a[k],i<k<=n}|。序列A 的逆序数定义为inv(A)=inv(a[1])+inv(a[2])+.....+inv(a[n])。 事实上,序列A 的逆序数刻画出序列A中元素已排序的程度。逆序数越小,序列A 已排序的程度就越高。当序列A 已排好序时,其逆序数为0。 生物信息学家在进行分子计算研究DNA序列时需要将若干长度相同的DNA串按其逆序数从小到大排序。 编写程序,对于给定长度相同的DNA串,按其逆序数从小到大的顺序排序。 DNA中的字符按照字符顺序比较大小,数据从"input.txt"的文件读入,并将结果输出到"output.txt"中。 【输入形式】 第一行有两个整数,分别为DNA长度L和DNA数量n 之后n行分别为n个DNA串 最后以两个0结束 【输出形式】 按逆序数从小到大每行输出一个DNA串。 【样例输入】 从input.txt输入: 10 6 AACATGAAGG TTTTGGCCAA TTTGGCCAAA GATCAGATTT CCCGGGGGGA ATCGATGCAT 00 【样例输出】 向output.txt输出: CCCGGGGGGA AACATGAAGG GATCAGATTT ATCGATGCAT TTTTGGCCAA TTTGGCCAAA

DNA进化过程计算机模拟仿真

利用最大似然法从祖先节点一步步产生子裔节点,想知道最大似然法中的参数的值怎么确定,还有这样的最大似然法的公式该怎么改变

有没有做过计算机模拟仿真真实物种的DNA序列,有没有代码参考下,最好是Delphi

有几个同源物种,然后要模拟仿真它们虚拟的DNA序列,虚拟真实的进化树

DNA序列字符串的一个处理修改的问题,怎么使用C程序语言代码编写的过程来编程解决的?

Problem Description Biologists finally invent techniques of repairing DNA that contains segments causing kinds of inherited diseases. For the sake of simplicity, a DNA is represented as a string containing characters 'A', 'G' , 'C' and 'T'. The repairing techniques are simply to change some characters to eliminate all segments causing diseases. For example, we can repair a DNA "AAGCAG" to "AGGCAC" to eliminate the initial causing disease segments "AAG", "AGC" and "CAG" by changing two characters. Note that the repaired DNA can still contain only characters 'A', 'G', 'C' and 'T'. You are to help the biologists to repair a DNA by changing least number of characters. Input The input consists of multiple test cases. Each test case starts with a line containing one integers N (1 ≤ N ≤ 50), which is the number of DNA segments causing inherited diseases. The following N lines gives N non-empty strings of length not greater than 20 containing only characters in "AGCT", which are the DNA segments causing inherited disease. The last line of the test case is a non-empty string of length not greater than 1000 containing only characters in "AGCT", which is the DNA to be repaired. The last test case is followed by a line containing one zeros. Output For each test case, print a line containing the test case number( beginning with 1) followed by the number of characters which need to be changed. If it's impossible to repair the given DNA, print -1. Sample Input 2 AAA AAG AAAG 2 A TG TGAATG 4 A G C T AGT 0 Sample Output Case 1: 1 Case 2: 4 Case 3: -1

这道题用哈希怎么解决??求解

这个问题来自 DNA序列的k-mer index问题。 给定一个DNA序列,这个系列只含有4个字母ATCG,如 S =“CTGTACTGTAT”。给定一个整数值k,从S的第一个位置开始,取一连续k个字母的短串,称之为k-mer(如k= 5,则此短串为CTGTA), 然后从S的第二个位置, 取另一k-mer(如k= 5,则此短串为TGTAC),这样直至S的末端,就得一个集合,包含全部k-mer 。 如对序列S来说,所有5-mer为 {CTGTA,TGTAC,GTACT,TACTG,ACTGT,TGTAT} 通常这些k-mer需一种数据索引方法,可被后面的操作快速访问。例如,对5-mer来说,当查询CTGTA,通过这种数据索引方法,可返回其在DNA序列S中的位置为{1,6}。 问题 现在以文件形式给定 100万个 DNA序列,序列编号为1-1000000,每个基因序列长度为100 。 (1)要求对给定k, 给出并实现一种数据索引方法,可返回任意一个k-mer所在的DNA序列编号和相应序列中出现的位置。每次建立索引,只需支持一个k值即可,不需要支持全部k值。 (2)要求索引一旦建立,查询速度尽量快,所用内存尽量小。 (3)给出建立索引所用的计算复杂度,和空间复杂度分析。 (4)给出使用索引查询的计算复杂度,和空间复杂度分析。 (5)假设内存限制为8G,分析所设计索引方法所能支持的最大k值和相应数据查询效率。

在中国程序员是青春饭吗?

今年,我也32了 ,为了不给大家误导,咨询了猎头、圈内好友,以及年过35岁的几位老程序员……舍了老脸去揭人家伤疤……希望能给大家以帮助,记得帮我点赞哦。 目录: 你以为的人生 一次又一次的伤害 猎头界的真相 如何应对互联网行业的「中年危机」 一、你以为的人生 刚入行时,拿着傲人的工资,想着好好干,以为我们的人生是这样的: 等真到了那一天,你会发现,你的人生很可能是这样的: ...

程序员请照顾好自己,周末病魔差点一套带走我。

程序员在一个周末的时间,得了重病,差点当场去世,还好及时挽救回来了。

Java基础知识面试题(2020最新版)

文章目录Java概述何为编程什么是Javajdk1.5之后的三大版本JVM、JRE和JDK的关系什么是跨平台性?原理是什么Java语言有哪些特点什么是字节码?采用字节码的最大好处是什么什么是Java程序的主类?应用程序和小程序的主类有何不同?Java应用程序与小程序之间有那些差别?Java和C++的区别Oracle JDK 和 OpenJDK 的对比基础语法数据类型Java有哪些数据类型switc...

和黑客斗争的 6 天!

互联网公司工作,很难避免不和黑客们打交道,我呆过的两家互联网公司,几乎每月每天每分钟都有黑客在公司网站上扫描。有的是寻找 Sql 注入的缺口,有的是寻找线上服务器可能存在的漏洞,大部分都...

Intellij IDEA 实用插件安利

1. 前言从2020 年 JVM 生态报告解读 可以看出Intellij IDEA 目前已经稳坐 Java IDE 头把交椅。而且统计得出付费用户已经超过了八成(国外统计)。IDEA 的...

搜狗输入法也在挑战国人的智商!

故事总是一个接着一个到来...上周写完《鲁大师已经彻底沦为一款垃圾流氓软件!》这篇文章之后,鲁大师的市场工作人员就找到了我,希望把这篇文章删除掉。经过一番沟通我先把这篇文章从公号中删除了...

总结了 150 余个神奇网站,你不来瞅瞅吗?

原博客再更新,可能就没了,之后将持续更新本篇博客。

副业收入是我做程序媛的3倍,工作外的B面人生是怎样的?

提到“程序员”,多数人脑海里首先想到的大约是:为人木讷、薪水超高、工作枯燥…… 然而,当离开工作岗位,撕去层层标签,脱下“程序员”这身外套,有的人生动又有趣,马上展现出了完全不同的A/B面人生! 不论是简单的爱好,还是正经的副业,他们都干得同样出色。偶尔,还能和程序员的特质结合,产生奇妙的“化学反应”。 @Charlotte:平日素颜示人,周末美妆博主 大家都以为程序媛也个个不修边幅,但我们也许...

MySQL数据库面试题(2020最新版)

文章目录数据库基础知识为什么要使用数据库什么是SQL?什么是MySQL?数据库三大范式是什么mysql有关权限的表都有哪几个MySQL的binlog有有几种录入格式?分别有什么区别?数据类型mysql有哪些数据类型引擎MySQL存储引擎MyISAM与InnoDB区别MyISAM索引与InnoDB索引的区别?InnoDB引擎的4大特性存储引擎选择索引什么是索引?索引有哪些优缺点?索引使用场景(重点)...

如果你是老板,你会不会踢了这样的员工?

有个好朋友ZS,是技术总监,昨天问我:“有一个老下属,跟了我很多年,做事勤勤恳恳,主动性也很好。但随着公司的发展,他的进步速度,跟不上团队的步伐了,有点...

我入职阿里后,才知道原来简历这么写

私下里,有不少读者问我:“二哥,如何才能写出一份专业的技术简历呢?我总感觉自己写的简历太烂了,所以投了无数份,都石沉大海了。”说实话,我自己好多年没有写过简历了,但我认识的一个同行,他在阿里,给我说了一些他当年写简历的方法论,我感觉太牛逼了,实在是忍不住,就分享了出来,希望能够帮助到你。 01、简历的本质 作为简历的撰写者,你必须要搞清楚一点,简历的本质是什么,它就是为了来销售你的价值主张的。往深...

魂迁光刻,梦绕芯片,中芯国际终获ASML大型光刻机

据羊城晚报报道,近日中芯国际从荷兰进口的一台大型光刻机,顺利通过深圳出口加工区场站两道闸口进入厂区,中芯国际发表公告称该光刻机并非此前盛传的EUV光刻机,主要用于企业复工复产后的生产线扩容。 我们知道EUV主要用于7nm及以下制程的芯片制造,光刻机作为集成电路制造中最关键的设备,对芯片制作工艺有着决定性的影响,被誉为“超精密制造技术皇冠上的明珠”,根据之前中芯国际的公报,目...

优雅的替换if-else语句

场景 日常开发,if-else语句写的不少吧??当逻辑分支非常多的时候,if-else套了一层又一层,虽然业务功能倒是实现了,但是看起来是真的很不优雅,尤其是对于我这种有强迫症的程序"猿",看到这么多if-else,脑袋瓜子就嗡嗡的,总想着解锁新姿势:干掉过多的if-else!!!本文将介绍三板斧手段: 优先判断条件,条件不满足的,逻辑及时中断返回; 采用策略模式+工厂模式; 结合注解,锦...

离职半年了,老东家又发 offer,回不回?

有小伙伴问松哥这个问题,他在上海某公司,在离职了几个月后,前公司的领导联系到他,希望他能够返聘回去,他很纠结要不要回去? 俗话说好马不吃回头草,但是这个小伙伴既然感到纠结了,我觉得至少说明了两个问题:1.曾经的公司还不错;2.现在的日子也不是很如意。否则应该就不会纠结了。 老实说,松哥之前也有过类似的经历,今天就来和小伙伴们聊聊回头草到底吃不吃。 首先一个基本观点,就是离职了也没必要和老东家弄的苦...

2020阿里全球数学大赛:3万名高手、4道题、2天2夜未交卷

阿里巴巴全球数学竞赛( Alibaba Global Mathematics Competition)由马云发起,由中国科学技术协会、阿里巴巴基金会、阿里巴巴达摩院共同举办。大赛不设报名门槛,全世界爱好数学的人都可参与,不论是否出身数学专业、是否投身数学研究。 2020年阿里巴巴达摩院邀请北京大学、剑桥大学、浙江大学等高校的顶尖数学教师组建了出题组。中科院院士、美国艺术与科学院院士、北京国际数学...

为什么你不想学习?只想玩?人是如何一步一步废掉的

不知道是不是只有我这样子,还是你们也有过类似的经历。 上学的时候总有很多光辉历史,学年名列前茅,或者单科目大佬,但是虽然慢慢地长大了,你开始懈怠了,开始废掉了。。。 什么?你说不知道具体的情况是怎么样的? 我来告诉你: 你常常潜意识里或者心理觉得,自己真正的生活或者奋斗还没有开始。总是幻想着自己还拥有大把时间,还有无限的可能,自己还能逆风翻盘,只不是自己还没开始罢了,自己以后肯定会变得特别厉害...

百度工程师,获利10万,判刑3年!

所有一夜暴富的方法都写在刑法中,但总有人心存侥幸。这些年互联网犯罪高发,一些工程师高技术犯罪更是引发关注。这两天,一个百度运维工程师的案例传遍朋友圈。1...

程序员为什么千万不要瞎努力?

本文作者用对比非常鲜明的两个开发团队的故事,讲解了敏捷开发之道 —— 如果你的团队缺乏统一标准的环境,那么即使勤劳努力,不仅会极其耗时而且成果甚微,使用...

为什么程序员做外包会被瞧不起?

二哥,有个事想询问下您的意见,您觉得应届生值得去外包吗?公司虽然挺大的,中xx,但待遇感觉挺低,马上要报到,挺纠结的。

当HR压你价,说你只值7K,你该怎么回答?

当HR压你价,说你只值7K时,你可以流畅地回答,记住,是流畅,不能犹豫。 礼貌地说:“7K是吗?了解了。嗯~其实我对贵司的面试官印象很好。只不过,现在我的手头上已经有一份11K的offer。来面试,主要也是自己对贵司挺有兴趣的,所以过来看看……”(未完) 这段话主要是陪HR互诈的同时,从公司兴趣,公司职员印象上,都给予对方正面的肯定,既能提升HR的好感度,又能让谈判气氛融洽,为后面的发挥留足空间。...

面试:第十六章:Java中级开发

HashMap底层实现原理,红黑树,B+树,B树的结构原理 Spring的AOP和IOC是什么?它们常见的使用场景有哪些?Spring事务,事务的属性,传播行为,数据库隔离级别 Spring和SpringMVC,MyBatis以及SpringBoot的注解分别有哪些?SpringMVC的工作原理,SpringBoot框架的优点,MyBatis框架的优点 SpringCould组件有哪些,他们...

面试阿里p7,被按在地上摩擦,鬼知道我经历了什么?

面试阿里p7被问到的问题(当时我只知道第一个):@Conditional是做什么的?@Conditional多个条件是什么逻辑关系?条件判断在什么时候执...

无代码时代来临,程序员如何保住饭碗?

编程语言层出不穷,从最初的机器语言到如今2500种以上的高级语言,程序员们大呼“学到头秃”。程序员一边面临编程语言不断推陈出新,一边面临由于许多代码已存在,程序员编写新应用程序时存在重复“搬砖”的现象。 无代码/低代码编程应运而生。无代码/低代码是一种创建应用的方法,它可以让开发者使用最少的编码知识来快速开发应用程序。开发者通过图形界面中,可视化建模来组装和配置应用程序。这样一来,开发者直...

面试了一个 31 岁程序员,让我有所触动,30岁以上的程序员该何去何从?

最近面试了一个31岁8年经验的程序猿,让我有点感慨,大龄程序猿该何去何从。

大三实习生,字节跳动面经分享,已拿Offer

说实话,自己的算法,我一个不会,太难了吧

程序员垃圾简历长什么样?

已经连续五年参加大厂校招、社招的技术面试工作,简历看的不下于万份 这篇文章会用实例告诉你,什么是差的程序员简历! 疫情快要结束了,各个公司也都开始春招了,作为即将红遍大江南北的新晋UP主,那当然要为小伙伴们做点事(手动狗头)。 就在公众号里公开征简历,义务帮大家看,并一一点评。《启舰:春招在即,义务帮大家看看简历吧》 一石激起千层浪,三天收到两百多封简历。 花光了两个星期的所有空闲时...

《Oracle Java SE编程自学与面试指南》最佳学习路线图2020年最新版(进大厂必备)

正确选择比瞎努力更重要!

字节跳动面试官竟然问了我JDBC?

轻松等回家通知

面试官:你连SSO都不懂,就别来面试了

大厂竟然要考我SSO,卧槽。

实时更新:计算机编程语言排行榜—TIOBE世界编程语言排行榜(2020年6月份最新版)

内容导航: 1、TIOBE排行榜 2、总榜(2020年6月份) 3、本月前三名 3.1、C 3.2、Java 3.3、Python 4、学习路线图 5、参考地址 1、TIOBE排行榜 TIOBE排行榜是根据全世界互联网上有经验的程序员、课程和第三方厂商的数量,并使用搜索引擎(如Google、Bing、Yahoo!)以及Wikipedia、Amazon、YouTube统计出排名数据。

立即提问
相关内容推荐