User Tools

Site Tools


problem_2b

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
problem_2b [2010/11/22 12:39]
ike
problem_2b [2014/01/18 07:44] (current)
Line 1: Line 1:
-====== ​Problem ​2-2 & 2-3 - CDS search ​and translation (2) ======+====== ​Practice ​2-2 & 2-3 - ORF prediction ​and translation (2) ======
  
-**[Problem ​2-2]**+**[Practice ​2-2]**
  
-**Refine ​the script from the previous ​problems ​to estimate all three patterns of possible amino acids from target coding region.**+**Modify ​the script from the previous ​practices ​to estimate all three patterns of possible amino acids from target coding region.**
  
  
-**[Problem ​2-3]**+**[Practice ​2-3]**
  
-**First, make a program to generate complementary strand from the given DNA sequence. Then, search for the coding regions in the complementary strand and estimate all three patterns of possible amino acid sequence. Put result of the Problem ​2-2, three possible sequences from template DNA, together and choose the most suitable coding region among 6 possible reading frames.** +**First, make a program to generate complementary strand from the given DNA sequence. Then, search for the coding regions in the complementary strand and estimate all three patterns of possible amino acid sequence. Put result of the Practice ​2-2, three possible sequences from template DNA, together and choose the most suitable coding region among 6 possible reading frames.** 
-===== Problem ​2-2 =====+===== Practice ​2-2 =====
  
-The previous program translated codon into amino acid chain from beginning ​of DNA sequence. But, thinking ​it carefully, what would happen if translation began from not beginning ​but from second position, or from third poison?+The previous program translated codon into amino acid chain from the first letter ​of DNA sequence. But, thinking carefully, what would happen if translation began not from the first position, ​but from the second position, or from the third poison?
  
   C A T G C T G A C   C A T G C T G A C
Line 21: Line 21:
         Cys   ​Stop  ​         Cys   ​Stop  ​
  
-In the above figure, ​first and third possible reading frame translates amino acid as well as second possible reading frame. But, second possible reading frame is clearly different that the sequence ​starts from methionineMoreover, third possible ​reading frame possesses ​stop codon in second position which means translation terminates if it was a coding region. ​Thereforeusers need to estimate ​three possible ​when searching coding region.+In the above figure, ​the second possible reading frame starts from a Methionine, possibly indicating the start of a coding regionLikewisethe third reading frame contains a stop codon, possibly indicating an end of a coding region. ​The nucleotide sequence obtained in Practice 0 is an arbitrary region of the //​Mycoplasma//​ genomeand therefore at this moment we are not certain from which base the translation should start. So all three possible ​reading frames should be tested.
  
-There are two types of way for implementation+There are two ways for doing this
-  - Create sequences that DO NOT possesses first base and first two bases from the given DNA sequence, respectively. +  - Create sequences that DO NOT possesses ​the first base or the first two bases from the given DNA sequence, respectively, and run the program created in Practice 2-1
-  - Estimate three possible patterns within a program using only given DNA sequence.+  - Estimate three possible patterns within a program using only one given DNA sequence.
  
-Users can choose ether 1 or 2, but we will demonstrate 2 since program is reusable for any types of DNA sequences. +Users can choose ether 1 or 2, but we will demonstrate 2 since such program is reusable for any types of DNA sequences.
-==== Refine previous program ====+
  
-Easy way to solve this problem is to modify previously made subroutine that translates codons into amino acids. Main point in here is to slide one or two positions in target DNA sequence so that users can search suitable coding regions from among three possible reading frames.+==== Refine the previous program ====
  
-To be more precise, put another for statement that loops around 0 to 2 for each possible reading frameIt would be nice if users can store each reading ​frame into some array. +Easy way to solve this problem is to modify previously made subroutine that translates codons into amino acidsMain point here is to slide one or two positions in target DNA sequence so that users can search suitable coding regions from among three possible ​reading ​frames.
-===== Problem 2-3 =====+
  
 +To be more precise, put another "​for"​ statement that loops around 0 to 2 representing each reading frame. It would be nice if users can store each reading frame into some array.
  
-In the Problem ​2-2, users dealt three possible types of reading frame, but there is one more possibility on reading frame to consider, a complementary strand.+===== Practice ​2-3 =====
  
-A complementary strand is a strand ​facing with a template strand.+In the Practice 2-2, you have considered three possible types of reading frames, but there is one more possibility on reading frame to consider: the complementary strand. 
 + 
 +A complementary strand is a strand ​complement to the template strand ​in a DNA duplex.
  
     5' __G T A C G A C T G __3'     5' __G T A C G A C T G __3'
     3' ​  C A T G C T G A C   ​5'​     3' ​  C A T G C T G A C   ​5'​
  
-DNA has a direction for each strand ​in double helix which flows from 5’ end to 3’ end. From this constraint in central dogma, a gene exists in the direction from 5’ end to 3’ end.+DNA has a direction for each strand ​of the double helix which progresses ​from 5’ end to 3’ end. From this constraint in central dogma, a gene exists in the direction from 5’ end to 3’ end.
  
-Double helix is consisted of two directionally different ​strands and each of strands ​has complementary features that adenine joins with thymine, and guanine with cytosine.+Double helix is consisted of two opposite ​strands and each strand ​has complementary features, such that adenine joins with thymine, and guanine with cytosine.
  
-To attain ​complementary strand from the template DNA sequence, uses can simply bring up complementary features of a nucleic acid. +To obtain ​complementary strand from the template DNA sequence, uses can simply bring up complementary features of a nucleic acid. 
   - Reverse the template DNA sequence   - Reverse the template DNA sequence
-  - Substitute nucleotide+  - Substitute nucleotide ​pair 
 + 
 +Putting all possibilities together, there are six types possibilities for reading frame, three from the template strand and other three from the complementary strand.
  
-Putting all possibilities together, there are six types possibilities for reading frame, three from the template strand and another three from complementary strand. 
 ==== Define new subroutine ==== ==== Define new subroutine ====
  
  
-Now, let’s make new subroutine in the Perl to generate complementary strand from the template strand. As an implementation,​ it would be nice to give $seq as an argument and attain ​complementary sequence as returned value. The processes for the generation are reversing the given sequence and substitution of nucleotide, A to T and G to C.+Now, let’s make new subroutine in the Perl to generate ​the complementary strand from the template strand. As an implementation,​ it would be nice to give $seq as an argument and obtain ​complementary sequence as returned value. The processes for the generation are reversing the given sequence and substitution of nucleotide, A to T and G to C, or vise varsa.
  
-Main flow of the program is same as the previous ​problems.+Main flow of the program is same as the previous ​practices.
  
   sub complemental () {   sub complemental () {
Line 70: Line 72:
  
  
-In the Perl there are handy function to reverse given string, reverse(). For example $a = reverse(“foobar”) returns “raboof”. So use function reverse() to reverse the template sequence.+In Perl there are handy function to reverse given string, reverse(). For example $a = reverse(“foobar”) returns “raboof”. So use function reverse() to reverse the template sequence.
  
 For the substitution,​ use function tr / / / for single character substitution. ​ For the substitution,​ use function tr / / / for single character substitution. ​
Line 77: Line 79:
              ​[tacg];​              ​[tacg];​
  
-This sample code substitutes “a” to “t”, “t” to “a”, “g” to “c”, and “c” to “g” in the $nuc. Since the Perl is specialized in natural language processing, deeper understanding of the language lead users to higher ​analysis ​of genomic ​studies. ​+This sample code substitutes “a” to “t”, “t” to “a”, “g” to “c”, and “c” to “g” in the $nuc. Since Perl is specialized in natural language processing, deeper understanding of the language lead you to higher ​level of genomics ​studies. ​ 
 ===== Search coding regions ===== ===== Search coding regions =====
-All possible reading frames is now ready. Subsequently,​ how can users estimate the suitable coding region from the six? 
  
-Obviouslycoding region needs to be proved by experimental process, but here users are going to make some estimation on which parts are coding region from bioinformatics point of view.+All possible reading frames are now ready. Subsequentlyhow can users estimate the suitable ​coding region from the six?
  
-Users are going to follow the following basic rule to search suitable reading frame. This rule is not the perfect rule, but it fits in the most cases.+Obviously, coding region needs to be proved by experimental processes, but here one can make some estimation computationally.
  
-  ​- A coding region begins ​from “atg” which codes methionine (M or Met).+The following basic rules work for the search of suitable reading frame. This rule is not at all perfect, but it is suitable as a starting point in most cases. 
 + 
 +  ​- A coding region begins ​with “atg” which codes methionine (M or Met).
   - Sometimes in bacterial genome a coding region begins from “gtg” which codes valine (V or Val) but in this case it turn outs to be start codon which codes methionine.   - Sometimes in bacterial genome a coding region begins from “gtg” which codes valine (V or Val) but in this case it turn outs to be start codon which codes methionine.
-  - Stop codons are “taa”, “tga”, and “tag” which means coding region should end with ether sequences+  - Stop codons are “taa”, “tga”, and “tag” which means coding region should end with these codons
-  - If more than one candidate ​clears ​above condition, select the longest sequence. +  - If more than one candidate ​satisfy the above conditions, select the longest ​possible ​sequence.
-===== Advanced programing =====+
  
 +===== Advanced programing =====
  
-Computational analysis in the field of biology ​itself ​is a superior principle. Butin the actual study, writing ​program ​is process next to primary ​work. +Bioinformatics,​ is a field of Biology. The application of computers and informatics to biology is not merely ​methodbut also defines a research philosophy. On the other hand, writing ​of programs for the bioinformatics researches ​is a rather routine ​work. 
  
-Therefore, ​programs should support ​and aid the researcher’s ​work. To achieve such a specificationsharing ​the programs, ​writing ​readable ​program or reusable program ​is going to be an important point.+Therefore, ​in order to concentrate on the biological research subject ​and to avoid putting too much time on the programming ​work, it is critical to keep the programs ​simple, readable, shareable, that is generic as opposed ​to being disposable
  
-Try to be aware of general-purpose properties ​of programs or replacement of redundant script into subroutineIf users became accustomed to the Perl, try to read some technical books on the PerlEffective Perl (http://​www.amazon.com/​Effective-Perl-Programming-Idiomatic-Development/​dp/​0321496949/​) is one of best book for next step Perlers. This textbook should bring users to accomplish tasks in much more efficient development cost.+Try to be non-redundant, and use subroutines effectively for reuse of codeI recommend reading ​technical books on Perl, such as Effective Perl (http://​www.amazon.com/​Effective-Perl-Programming-Idiomatic-Development/​dp/​0321496949/​) ​which is one of the best books for getting up to the next level
  
-Thank you for joining with us for the series ​of workswe wish your  ​continuous success.+Thank you for going through all of these practicesand I wish you the best for your researches.
  
problem_2b.txt · Last modified: 2014/01/18 07:44 (external edit)