DNA Translation DNA序列的问题

Description

Deoxyribonucleic acid (DNA) is composed of a sequence of nucleotide bases paired together to form a double-stranded helix structure. Through a series of complex biochemical processes the nucleotide sequences in an organism's DNA are translated into the proteins it requires for life. The object of this problem is to write a computer program which accepts a DNA strand and reports the protein generated, if any, from the DNA strand.

The nucleotide bases from which DNA is built are adenine, cytosine, guanine, and thymine (hereafter referred to as A, C, G, and T, respectively). These bases bond together in a chain to form half of a DNA strand. The other half of the DNA strand is a similar chain, but each nucleotide is replaced by its complementary base. The bases A and T are complementary, as are the bases C and G. These two "half-strands" of DNA are then bonded by the pairing of the complementary bases to form a strand of DNA.

Typically a DNA strand is listed by simply writing down the bases which form the primary strand (the complementary strand can always be created by writing the complements of the bases in the primary strand). For example, the sequence TACTCGTAATTCACT represents a DNA strand whose complement would be ATGAGCATTAAGTGA. Note that A is always paired with T, and C is always paired with G.

From a primary strand of DNA, a strand of ribonucleic acid (RNA) known as messenger RNA (mRNA for short) is produced in a process known as transcription. The transcribed mRNA is identical to the complementary DNA strand with the exception that thymine is replaced by a nucleotide known as uracil (hereafter referred to as U). For example, the mRNA strand for the DNA in the previous paragraph would be AUGAGCAUUAAGUGA.

It is the sequence of bases in the mRNA which determines the protein that will be synthesized. The bases in the mRNA can be viewed as a collection of codons, each codon having exactly three bases. The codon AUG marks the start of a protein sequence, and any of the codons UAA, UAG, or UGA marks the end of the sequence. The one or more codons between the start and termination codons represent the sequence of amino acids to be synthesized to form a protein. For example, the mRNA codon AGC corresponds to the amino acid serine (Ser), AUU corresponds to isoleucine (Ile), and AAG corresponds to lysine (Lys). So, the protein formed from the example mRNA in the previous paragraph is, in its abbreviated form, Ser-Ile-Lys.

The complete genetic code from which codons are translated into amino acids is shown in the table below (note that only the amino acid abbreviations are shown). It should also be noted that the sequence AUG, which has already been identified as the start sequence, can also correspond to the amino acid methionine (Met). So, the first AUG in a mRNA strand is the start sequence, but subsequent AUG codons are translated normally into the Met amino acid.
First base in codon Second base in codon Third base in codon
U C A G
U Phe Ser Tyr Cys U
Phe Ser Tyr Cys C
Leu Ser --- --- A
Leu Ser --- Trp G
C Leu Pro His Arg U
Leu Pro His Arg C
Leu Pro Gln Arg A
Leu Pro Gln Arg G
A Ile Thr Asn Ser U
Ile Thr Asn Ser C
Ile Thr Lys Arg A
Met Thr Lys Arg G
G Val Ala Asp Gly U
Val Ala Asp Gly C
Val Ala Glu Gly A
Val Ala Glu Gly G
Input

The input for this program consists of strands of DNA sequences, one strand per line, from which the protein it generates, if any, should be determined and output. The given DNA strand may be either the primary or the complementary DNA strand, and it may appear in either forward or reverse order, and the start and termination sequences do not necessarily appear at the ends of the strand. For example, a given input DNA strand to form the protein Ser-Ile-Lys could be any of ATACTCGTAATTCACTCC, CCTCACTTAATGCTCATA, TATGAGCATTAAGTGAGG, or GGAGTGAATTACGAGTAT. The input will be terminated by a line containing a single asterisk character.
Output

You may assume the input to contain only valid, upper-case, DNA nucleotide base letters (A, C, G, and T). No input line will exceed 255 characters in length. There will be no blank lines or spaces in the input. Some sequences, though valid DNA strands, do not produce valid protein sequences; the string "*** No translatable DNA found ***" should be output when an input DNA strand does not translate into a valid protein.
Sample Input

ATACTCGTAATTCACTCC
CACCTGTACACAGAGGTAACTTAG
TTAATACGACATAATTAT
GCCTTGATATGGAGAACTCATTAGATA
AAGTGTATGTTGAATTATATAAAACGGGCATGA
ATGATGATGGCTTGA
*
Sample Output

Ser-Ile-Lys
Cys-Leu-His
Ser-Tyr
*** No translatable DNA found ***
Leu-Asn-Tyr-Ile-Lys-Arg-Ala
Met-Met-Ala

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问

相似问题

1
条件搜索的一个题目,具体看题目下面的描述,谢谢
0
用C语言编程,产生所有可能的长度为10bp的DNA序列
0
求生物信息学中高效去冗余的算法,一个DNA的fasta文件
2
现在有从UCI上下载下来的一组DNA序列数据,想把他按照某种规则转换成只有0和1的序列,代码应该怎么写?
0
DNA遗传算法的一个难题的解决的思路,怎么运用C语言解决这个题的办法
0
、请设计一个用于保存和处理DNA序列的类DNASequence,该类具有以下特征和功能:
0
字符串分离的代码问题,怎么采用C语言程序设计的方式实现代码编写的过程?
0
DNA序列字符串的一个处理修改的问题,怎么使用C程序语言代码编写的过程来编程解决的?
0
DNA增长的规则的模拟,怎么采用C语言的代码编写的过程的思想去实现的
0
DNA的进化的推算问题,怎么使用C语言的程序的编写的模式的思想去完成程序的实现
0
DNA上ACGT四种蛋白质序列的程序表达和计算,怎么利用C语言的程序编写的方式加以有效实现的
0
计算DNA基因序列的长度,怎么用C语言的程序设计的思想方式和技术来计算的呢?
0
DNS序列的修改修复问题,怎么用C语言的程序编写出来的程序的代码的技术去实现的方法是什么
0
字符串的差异的修改的比较,怎么利用C语言的程序的设计的形式的方式来解决的
0
用C语言的程序来编写一个程序来模拟培养物的生长,读取待模拟的天数,DNA规则以及培养皿的初始种群密度
0
Microgene 是怎么程序设计的
0
DNA repair 怎么具体一个实现
0
DNA的复制的算法,Copying DNA
0
Decompressing in a GIF 压缩问题
0
DNA Sorting 的排序问题