User Tools

Site Tools


restgenomeanalysisenglish

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
restgenomeanalysisenglish [2014/08/10 14:50]
haruo
restgenomeanalysisenglish [2014/12/13 08:25]
haruo
Line 124: Line 124:
   * http://​rest.g-language.org/​NC_000913/​rRNA/​16S (shows a list of feature IDs of 16S rRNAs)   * http://​rest.g-language.org/​NC_000913/​rRNA/​16S (shows a list of feature IDs of 16S rRNAs)
   * http://​rest.g-language.org/​NC_000913/​get_geneseq/​FEATURE462 (shows nucleotide sequence of 16S rRNA with FEATURE462)   * http://​rest.g-language.org/​NC_000913/​get_geneseq/​FEATURE462 (shows nucleotide sequence of 16S rRNA with FEATURE462)
 +
 +
 +====== Pattern searches ======
 +The [[http://​www.g-language.org/​documentation/​1.9.0/​lib/​G/​Seq/​PatSearch.html|PatSearch]] class is a collection of sequence analysis methods related to pattern searches for oligonucleotides,​ including [[http://​rest.g-language.org/​help/​oligomer_search|oligomer_search]] and [[http://​rest.g-language.org/​help/​palindrome|palindrome]]. For example, you can search an [[https://​en.wikipedia.org/​wiki/​Inverted_repeat#​Palindrome_vs._inverted_repeat|inverted repeat]] (5' TTACGnnnnnnCGTAA 3') and palindrome (5' TTACGCGTAA 3') as follows.
 +
 +For //E.coli// (ecoli),
 +http://​rest.g-language.org/​ecoli/​oligomer_search/​TTACGCGTAA
 +searches for an oligomer "​TTACGCGTAA"​ and returns list of positions where oligomers are found as follows.
 +  209570,​1164188,​1443204,​1934579,​2167198,​2919269,​4203297
 +
 +http://​rest.g-language.org/​ecoli/​oligomer_search/​TTACGnnnnnnCGTAA/​return=both
 +searches for an inverted repeat "​TTACGnnnnnnCGTAA"​ and returns both positions and oligomers as follows.
 +  843936,​ttacgaaacagcgtaa,​3112312,​ttacgcacaggcgtaa
 +
 +Oligomer can be specified using degenerate code (e.g. "​grtggngg"​) or regular expressions (e.g. "​g[ag]tgg[a-z]gg"​).
 +
 +http://​rest.g-language.org/​plasmidf/​palindrome/​shortest=10
 +searches 10-bp or longer palindrome sequences in plasmid F (plasmidf)
  
  
Line 317: Line 335:
 Of the existing COA methods, Within-group Correspondence Analysis (WCA) performs best because it does not mask variation in synonymous codon usage caused by amino acid composition and codon degeneracy [[http://​www.ncbi.nlm.nih.gov/​pubmed/​18940873|(Suzuki H et al., 2008)]]. Of the existing COA methods, Within-group Correspondence Analysis (WCA) performs best because it does not mask variation in synonymous codon usage caused by amino acid composition and codon degeneracy [[http://​www.ncbi.nlm.nih.gov/​pubmed/​18940873|(Suzuki H et al., 2008)]].
  
-codon_mva (http://​rest.g-language.org/​help/​codon_mva) performs WCA of codon usage data for a given genome, and analyzes correlations between the WCA axes and various gene parameters (e.g. Laa, aroma, gravy, mmw, gcc3, gtc3, and P2). In the WCA plots, the first four axes (Comp1 to Comp4) obtained by WCA are shown in y-axes, and the gene features (gcc3 and gtc3) having the largest absolute correlation coefficients (|r|) are shown in x-axes. ​+codon_mva (http://​rest.g-language.org/​help/​codon_mva) performs WCA of codon usage data for a given genome, and analyzes correlations between the WCA axes and various gene parameters (e.g. Laa, aroma, gravy, mmw, gcc3, gtc3, and P2). In the WCA plots, the first four axes (Comp1 to Comp4) obtained by WCA are shown in y-axes, and the gene features (e.g. gcc3 and gtc3) having the largest absolute correlation coefficients (|r|) are shown in x-axes. ​
  
   * For //E.coli// (http://​rest.g-language.org/​ecoli/​codon_mva),​ Comp1 (20.8% of variation) is correlated with gcc3 (G+C content at 3rd codon position) (r = 0.70). Comp2 (9.9% of variation) clearly separates highly expressed genes (red circles) from the other genes (black crosses) (the mean absolute standard score for highly expressed genes, z = 3.14), suggesting that translational selection is acting on synonymous codon usage.   * For //E.coli// (http://​rest.g-language.org/​ecoli/​codon_mva),​ Comp1 (20.8% of variation) is correlated with gcc3 (G+C content at 3rd codon position) (r = 0.70). Comp2 (9.9% of variation) clearly separates highly expressed genes (red circles) from the other genes (black crosses) (the mean absolute standard score for highly expressed genes, z = 3.14), suggesting that translational selection is acting on synonymous codon usage.
  
-  * For //​B.burgdorferi//​ (http://​rest.g-language.org/​bbur/​codon_mva),​ Comp1 is correlated with gtc3 (G+T content at 3rd codon position) (r = 0.96), and it clearly separates genes located on leading and lagging strands of DNA replication. Intragenomic variation in GC skew (http://​rest.g-language.org/​bbur/​gcskew/​cumulative=1) and AT skew (http://​rest.g-language.org/​bbur/​gcskew//​cumulative=1/at=1) presumably reflects strand-specific mutational bias.+  * For //​B.burgdorferi//​ (http://​rest.g-language.org/​bbur/​codon_mva),​ Comp1 is correlated with gtc3 (G+T content at 3rd codon position) (r = 0.96), and it clearly separates genes located on leading and lagging strands of DNA replication. Intragenomic variation in GC skew (http://​rest.g-language.org/​bbur/​gcskew) and AT skew (http://​rest.g-language.org/​bbur/​gcskew/​at=1) presumably reflects strand-specific mutational bias.
  
   * For //​M.genitalium//​ (http://​rest.g-language.org/​mgen/​codon_mva),​ Comp1 is correlated with gcc3 (r = 0.96). Intragenomic variation in G+C content mostly reflects the existence of regions with anomalous nucleotide composition,​ putatively acquired by horizontal transfer. The exception to this is //​M.genitalium//,​ in which intragenomic G+C variation is continuous along the genome (http://​rest.g-language.org/​mgen/​gcwin).   * For //​M.genitalium//​ (http://​rest.g-language.org/​mgen/​codon_mva),​ Comp1 is correlated with gcc3 (r = 0.96). Intragenomic variation in G+C content mostly reflects the existence of regions with anomalous nucleotide composition,​ putatively acquired by horizontal transfer. The exception to this is //​M.genitalium//,​ in which intragenomic G+C variation is continuous along the genome (http://​rest.g-language.org/​mgen/​gcwin).
restgenomeanalysisenglish.txt ยท Last modified: 2014/12/13 08:25 by haruo