User Tools

Site Tools


tutorialcodonusageenglish

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
tutorialcodonusageenglish [2007/08/20 13:13]
gaou
tutorialcodonusageenglish [2014/01/18 07:44] (current)
Line 10: Line 10:
 Now, before we start to analyze codon usage, let’s start the G-language System. If you use the G-language System, genome analysis is very simple. Now, before we start to analyze codon usage, let’s start the G-language System. If you use the G-language System, genome analysis is very simple.
  
-For example, if you use the data file “bsub.gbk” (the complete genome of Bacillus subtilis in the Genbank form) under the current directory to analyze, you only need to input the next two lines to get ready.+For example, if you use the data file “bsub” (the complete genome of //Bacillus subtilis// in the GenBank format bundled with G-language System) under the current directory to analyze, you only need to input the next two lines to get ready.
  
 +<code perl>
   use G;    use G; 
-  $gb = new G("​bsub.gbk");  +  $gb = new G("​bsub"​);​  
 +</​code>​
 Let’s try and execute the following ​ Perl script. (Please set the file name to “test.pl”) Let’s try and execute the following ​ Perl script. (Please set the file name to “test.pl”)
  
Line 51: Line 52:
           GC Content :    43.52% ​           GC Content :    43.52% ​
   ​   ​
-With the output of the Accession Number and the base content statistic, the G-language System informs that it has successfully read the data file (“bsub.gbk”).+With the output of the Accession Number and the base content statistic, the G-language System informs that it has successfully read the data file (“bsub”).
  
 Now, we will explain the script. Now, we will explain the script.
   * use G: imports module G   * use G: imports module G
-  * $gb = new G("​bsub.gbk"​);​ loads the file “bsub.gbk” under the current directory, and stores the annotation and the base sequence under the variable ”$gb”.+  * $gb = new G("​bsub"​);​ loads the file “bsub” under the current directory, and stores the annotation and the base sequence under the variable ”$gb”.
  
  
Line 62: Line 63:
 === Exercise 0: === === Exercise 0: ===
  
-Load the complete genome data for other bacteria (such as “ecoli.gbk”, “hbsp.gbk”, and “mgen.gbk”). +Load the complete genome data for other bacteria (such as “ecoli”,​ “cyano”, and “mgen”). 
->{Hint} Rewrite the line $gb->new G("​bsub.gbk"); on the above script.+>{Hint} Rewrite the line $gb->new G("​bsub"​);​ on the above script.
  
  
Line 70: Line 71:
 ====== Step1: Examples of standard function usage (1): Analyzing of codon usage among entire genes. ​ ====== ====== Step1: Examples of standard function usage (1): Analyzing of codon usage among entire genes. ​ ======
  
-In the G-language System, there are several functions set for genome analysis. Now, let’s analyze codon usage for B.subtilis, using one of the standard functions “codon usage()”. Please rewrite a line using the codon usage() function in the Perl script you have created in Step0.+In the G-language System, there are several functions set for genome analysis. Now, let’s analyze codon usage for //B.subtilis//, using one of the standard functions “codon_usage()”. Please rewrite a line using the codon usage() function in the Perl script you have created in Step0.
  
 +<code perl>
   use G;    use G; 
-  $gb = new G("​bsub.gbk"​); ​+  $gb = new G("​bsub"​); ​
   codon_usage($gb); ​   codon_usage($gb); ​
 +</​code>​
  
 If you execute the sript, codon usage percentage should be shown on the display, and the following codon table should be displayed. If you execute the sript, codon usage percentage should be shown on the display, and the following codon table should be displayed.
Line 80: Line 83:
 {{http://​www.g-language.org/​data/​haruo/​codon_table.gif}} {{http://​www.g-language.org/​data/​haruo/​codon_table.gif}}
  
-This is the chart of the calculation of the frequency of synonym codons in the B.subtilis genome. Each amino acids sum of synonym codons frequancy equals one. For example, for phenylalanine(code:​F),​ the percentage of TTC is 0.315 compared to 0.685 for TTT. Of the two synonym codons, it heavily uses TTT. Of the three synonym codons that exist, Isoleucine(code :i) heavily uses ATT. As you can see, there is a pattern in which each amino acid is biased upon a peculiar codon.+This is the chart of the calculation of the frequency of synonym codons in the B//.subtilis// genome. Each amino acids sum of synonym codons frequancy equals one. For example, for phenylalanine(code:​F),​ the percentage of TTC is 0.315 compared to 0.685 for TTT. Of the two synonym codons, it heavily uses TTT. Of the three synonym codons that exist, Isoleucine (code :i) heavily uses ATT. As you can see, there is a pattern in which each amino acid is biased upon a peculiar codon.
  
  
Line 87: Line 90:
 === Excercise 1: === === Excercise 1: ===
  
-Compare and examine the deflection of each amino acids synonym codon usage for mycoplasma ​genitalium (“mgen.gbk”) and B.subtilis.  +Compare and examine the deflection of each amino acids synonym codon usage for //​Mycoplasma ​genitalium// (“mgen”) and //B.subtilis//.  
->​[Hint] ​ Rewrite the line $gb->new G("​bsub.gbk"); in the above script.+>​[Hint] ​ Rewrite the line $gb->new G("​bsub"​);​ in the above script.
  
  
Line 97: Line 100:
 The standard functions for the G-language System has several options. The standard functions for the G-language System has several options.
  
-The function “codon usage()” has the following options.+The function “codon_usage()” has the following options.
  
 ^option^description^ ^option^description^
Line 110: Line 113:
 Please rewrite the script in Step one to the following example. Please rewrite the script in Step one to the following example.
  
 +<code perl>
   use G;    use G; 
-  $gb = new G("​bsub.gbk"​); ​+  $gb = new G("​bsub"​); ​
   codon_usage($gb,​ -CDSid=>'​CDS113'​); ​   codon_usage($gb,​ -CDSid=>'​CDS113'​); ​
 +</​code>​
  
-“CDS113” corresponds with the gene (tufA) that codes with the elongation factor (TU, EF-Tu). If you execute the above script, the codon usage percentage for the tufA gene should be shown on the display like the following example.+“CDS113” corresponds with the gene (//tufA//) that codes with the elongation factor (TU, EF-Tu). If you execute the above script, the codon usage percentage for the //tufA// gene should be shown on the display like the following example.
  
   / -> taa -> 1   ​1.000 ​   / -> taa -> 1   ​1.000 ​
Line 163: Line 168:
  
 It shows from the left, the abbreviation of the amino acid -> the codon -> the codon sum -> percentage of synonym codons. It shows from the left, the abbreviation of the amino acid -> the codon -> the codon sum -> percentage of synonym codons.
-You can tell from this output, the deflection of synonym codons for the tufA gene, differs greatly from the pattern of the entire genome universe. For example, of the two synonym codons it possesses, phenylalanine (code: F) uses only TTC as a synonym codon. Of the four synonym codons it possesses, alanine (code :A) only uses GCT. +You can tell from this output, the deflection of synonym codons for the //tufA// gene, differs greatly from the pattern of the entire genome universe. For example, of the two synonym codons it possesses, phenylalanine (code: F) uses only TTC as a synonym codon. Of the four synonym codons it possesses, alanine (code :A) only uses GCT. 
  
  
Line 171: Line 176:
 === Excercise 2: === === Excercise 2: ===
  
-Please calculate the codon usage percentage for the gene  (dnaA), witch codes DnaA protein and relates to DNA replication.+Please calculate the codon usage percentage for the gene  (//dnaA//), witch codes DnaA protein and relates to DNA replication.
 >Hint: rewrite ​ the line “codon_usage($gb,​ -CDSid=>'​CDS113'​);​” in the above script. >Hint: rewrite ​ the line “codon_usage($gb,​ -CDSid=>'​CDS113'​);​” in the above script.
  
Line 185: Line 190:
  
 and refer to the [[http://​www.g-language.org/​documentation/​1.7.1/​G.html#​SYNOPSIS|perldoc documentation of G.pm]]. and refer to the [[http://​www.g-language.org/​documentation/​1.7.1/​G.html#​SYNOPSIS|perldoc documentation of G.pm]].
-Now, I will show a part of the data file (“bsub.gbk”) that we have used.+Now, I will show a part of the data file (“bsub”) that we have used.
  
    
Line 237: Line 242:
 In the script written in Step 0, rewrite the following line that outputs the beginning and ending position for ‘CDS1’, and execute the file. In the script written in Step 0, rewrite the following line that outputs the beginning and ending position for ‘CDS1’, and execute the file.
  
 +<code perl>
   print "​$gb->​{CDS1}->​{start}..$gb->​{CDS1}->​{end}"; ​   print "​$gb->​{CDS1}->​{start}..$gb->​{CDS1}->​{end}"; ​
 +</​code>​
  
 Confirm the outcome corresponds to the data file above. Confirm the outcome corresponds to the data file above.
Line 249: Line 256:
 $gb->​cds() relays all CDS object names stored inside $gb in a sequence. For example, the script made in Question2 which analyzes codon usage percentage for the DnaA gene, should get the same result as above, even if you rewrite $gb->​cds() like the following. $gb->​cds() relays all CDS object names stored inside $gb in a sequence. For example, the script made in Question2 which analyzes codon usage percentage for the DnaA gene, should get the same result as above, even if you rewrite $gb->​cds() like the following.
  
 +<code perl>
   use G;    use G; 
-  $gb = new G("​bsub.gbk"​); ​+  $gb = new G("​bsub"​); ​
        
   foreach $cds ($gb->​cds()){ ​   foreach $cds ($gb->​cds()){ ​
Line 257: Line 265:
     }      } 
   }    } 
 +</​code>​
  
 Now, let me explain the script. Now, let me explain the script.
 (1) The foreach line has the following structure. (1) The foreach line has the following structure.
  
 +<code perl>
   foreach $variable (@array){ ​   foreach $variable (@array){ ​
-       some process here.+       #some process here.
    ​} ​    ​} ​
 +</​code>​
 From the top, the element of the sequence is substituted with a variable, and is processed each time there is a substitution. Therefore, ​ From the top, the element of the sequence is substituted with a variable, and is processed each time there is a substitution. Therefore, ​
 “foreach $cds ($gb->​cds()){” means to substitute sequence element(the object name of CDS)  to a variable called $cds in order and process it. This is the basic way to process by each gene in the G-language System. “foreach $cds ($gb->​cds()){” means to substitute sequence element(the object name of CDS)  to a variable called $cds in order and process it. This is the basic way to process by each gene in the G-language System.
Line 271: Line 281:
 This is written “variable=~/​regular expression/​. This is written “variable=~/​regular expression/​.
  
 +<code perl>
   if($gb->​{$cds}->​{gene} =~ /​dnaA/​){ ​   if($gb->​{$cds}->​{gene} =~ /​dnaA/​){ ​
         codon_usage($gb,​ -CDSid=>​$cds); ​         codon_usage($gb,​ -CDSid=>​$cds); ​
    ​} ​    ​} ​
 +</​code>​
 +
 In the script above, it means if a gene($gb->​{$cds}->​{gene}) matches “dnaA”, it is to calculate the codon usage in the CDS.  In the script above, it means if a gene($gb->​{$cds}->​{gene}) matches “dnaA”, it is to calculate the codon usage in the CDS. 
    
tutorialcodonusageenglish.1187615585.txt.gz · Last modified: 2014/01/18 07:44 (external edit)