User Tools

Site Tools


cgr

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cgr [2009/07/17 02:44]
admin
cgr [2014/01/18 07:44]
Line 1: Line 1:
-====== A Web Server for Zoomable Chaos Game Represenations ====== 
-===== Quick Demo ===== 
  
-===== Overview ===== 
-==== Chaos Game Representation ==== 
- 
-Chaos Game Representation (CGR) is a generalized scale-independent Markov probability table for the sequence, and oligomer tables (see document for "​kmer_table"​ function) can be deduced from CGR image. 
- 
-CGR is generated by the following procedure: 
-  * Start from position (0,0) or the origin of two dimensional coordinate. 
- 
-       Four nucleotides are located at the four corners: 
-           A: (-1, 1)  upper left 
-           T: (1, -1)  lower right 
-           G: (1, 1)   upper right 
-           C: (-1, -1) lower left 
- 
-  * For each nucleotide (in reverse order to match k-mer table; i.e., match forward Markov chain), move and mark the new location which is halfway between the current location and the nucleotide. For example, if the last letter is T, position is moved from (0,0) to midpoint between (1, -1) and (0,0), which is (0.5, -0.5). 
-  * Repeat this procedure for all nucleotides. 
- 
-{{:​cgr-ex.png?​400|}} 
- 
-==== k-mer table ==== 
- 
-This program creates an image showing the abundance of all k-mers (oligonucleotides of length k) in a given sequence. For example, for tetramers (k=4), resulting image is composed of 4^4 = 256 boxes, each representing an oligomer. Oligomer name and abundance is written within these boxes, and abundance is also visualized with the box color, from white (none) to black (highly frequent). 
- 
-{{:​kmer_table.png|}} 
- 
-This k-mer table is alternatively known as the FCGR (frequency matrices extracted from Chaos Game Representation). 
- 
-Position of the oligomers can be recursively located as follows: 
-  * For each letter in an oligomer, a box is subdivided into four quadrants, where A is upper left, T is lower right, G is upper right, and C is lower left. Therefore, oligomer ACGT is in the 
- 
-      A = upper left quadrant 
-      C = lower left within the above quadrant 
-      G = upper right within the above quadrant 
-      T = lower right within the above quadrant 
- 
-{{:​kmer-ex.png?​400|}} 
- 
-==== Zoomable Google Maps ==== 
- 
-For CGR and k-mer tables to be useful as generalized scale-independent Markov probability table, it is critical for a user to be able to locate the oligonucleotides of interest quickly within the complex image. For this purpose, we have implemented CGR as zoomable Google Maps.  
- 
-{{:​googlemapcgr.png|}} 
- 
-Here you can easily pan and zoom the large image with the same user interface as the familiar Google Maps; therefore, you can use the controller located in the top left corner for panning and zooming, or double clicking or use mouse scroll wheel for zooming, and so on. 
- 
-As an addition, we have implemented a search capability. The search box is located at the top center of the map, where you can type in oligonucleotide sequences to perform incremental search/​highlight over the map. For example, typing "​CG"​ in the search box immediately highlights the position of oligomers starting with letters "​CG"​. 
- 
-{{:​snapz_pro_xscreensnapz001.png|}} 
- 
-You can use "​N"​ to represent wild card nucleotide. Therefore, searching for "​NCG"​ highlights four regions corresponding to "​ACG",​ "​TCG",​ "​GCG",​ and "​CCG"​. With this feature, users can quickly locate the oligonucleotide of interest, and observe the Markov chain probabilities following the specified nucleotides. 
- 
-{{:​snapz_pro_xscreensnapz002.png|}} 
- 
-==== Chaos Game Representation for five prokaryotes ==== 
- 
-Click on the images to enlarge. 
- 
-^//​Escherichia coli// ^//Bacillus subtilis// | 
-| {{:​ecoli.png?​300|Escherichia coli}} | {{:​bsub.png?​300|Bacillus subtilis}} | 
-^//​Mycoplasma genitalium//​ ^//​Synechococcus sp.// | 
-| {{:​mgen.png?​300|Mycoplasma genitalium}} | {{:​cyano.png?​300|Synechococcus sp.}} | 
-^//​Pyrococcus furiosus// | 
-| {{:​pyro.png?​300|Pyrococcus furiosus}}| 
- 
-===== Usage ===== 
-==== Web Service (Generation of images) ==== 
- 
-Basically, our web service can be accessed by specifying the URL according to a certain syntax. 
- 
-**Syntax:** 
-<​code>​ 
-http://​rest.g-language.org/​[genome]/​[method]/​ 
-</​code>​ 
- 
-Here the **[genome]** is a RefSeq accession number (see [[http://​rest.g-language.org/​organism_list/​|here]] for listing), and **[method]** is either **cgr** (for Chaos Game Representation) or **kmer_table** (for k-mer table). 
- 
-For example, for //​Mycoplasma genitalium//​ genome (RefSeq: NC_000908), 
-  * [[http://​rest.g-language.org/​NC_000908/​cgr/​]] 
-  * [[http://​rest.g-language.org/​NC_000908/​kmer_table/​]] 
- 
-You can simply change the genome ID for other species. For example, for //​Carsonella ruddii// (RefSeq: NC_008512), 
-  * [[http://​rest.g-language.org/​NC_008512/​cgr/​]] 
-  * [[http://​rest.g-language.org/​NC_008512/​kmer_table/​]] 
- 
-In this way, all maps are generated on the fly, and is always up-to-date. Moreover, other web-pages or web-database sites can utilize our service to add CGR and k-mer table to their website, by simply referring to our URL. 
- 
-To use with your own sequence data (i.e. those not included in [[http://​rest.g-language.org/​organism_list/​|our list of genomes]], access the following URL and upload. 
- 
-  * [[http://​rest.g-language.org/​upload/​]] 
- 
-When the sequence is uploaded, you will receive a reference ID (6 digit 16bit). You can then use this ID in place of the **[genome]**. ​ 
- 
-For example, if you received an ID of "​B619CD", ​ 
-  * [[http://​rest.g-language.org/​B619CD/​cgr/​]] 
-  * [[http://​rest.g-language.org/​B619CD/​kmer_table/​]] ​ 
- 
-You can specify the oligonucleotide length for **kmer_table** with the **k** option, and width of image for **cgr** with the **width** option. ​ 
- 
-For example, ​ 
-  * [[http://​rest.g-language.org/​NC_000908/​cgr/​width=512/​]] (CGR, 512 pixels width) 
-  * [[http://​rest.g-language.org/​NC_000908/​cgr/​width=256/​]] (CGR, 256 pixels width) 
-  * [[http://​rest.g-language.org/​NC_000908/​cgr/​width=128/​]] (CGR, 128 pixels width) 
-  * [[http://​rest.g-language.org/​NC_000908/​kmer_table/​k=4]] (table of tetramers) 
-  * [[http://​rest.g-language.org/​NC_000908/​kmer_table/​k=3]] (table of tri-nucleotides) 
-  * [[http://​rest.g-language.org/​NC_000908/​kmer_table/​k=2]] (table of di-nucleotides) 
- 
-==== Web Service (Google Maps) ==== 
- 
-For generation of zoomable Google Maps, simply add **output=gmap** option to the above URLs. 
-  * [[http://​rest.g-language.org/​NC_000908/​cgr/​output=gmap/​]] 
-  * [[http://​rest.g-language.org/​NC_000908/​kmer_table/​output=gmap/​]] 
-  * [[http://​rest.g-language.org/​NC_008512/​cgr/​output=gmap/​]] 
-  * [[http://​rest.g-language.org/​NC_008512/​kmer_table/​output=gmap/​]] 
- 
-Since this feature requires a lot of computational power, web service is limited to the generation of 5 zoom levels. For larger zoom level, use the Perl API.  
- 
-==== Perl API ==== 
- 
-For usage through Perl API, firstly download and install G-language Genome Analysis Environment v.1.8.9 or above following the instructions in: 
-http://​www.g-language.org/​wiki/​software 
- 
-Then, create the following script: 
-<code perl> 
-use G; 
-$genome = load("​FastaFile.fasta"​);​ 
-cgr($genome);​ 
-kmer_table($genome);​ 
-</​code>​ 
- 
-The **load** function can take most common sequence formats, such as GenBank and EMBL. 
-See the [[http://​rest.g-language.org/​help/​load|API documentation for "​load"​]] for more details. 
- 
-In order to change the parameters, such as the length of oligomers for **kmer_table** (-k option), generation of Google Map (-gmap=>​1 option), or changing the image width for **cgr** (-width option), simply add these options and values to the function call, as follows: 
- 
-<code perl> 
-use G; 
-$genome = load("​genbank.gbk"​);​ 
-cgr($genome,​ -width=>​512,​ -gmap=>​1);​ 
-kmer_table($genome,​ -k=>8, -gmap=>​1);​ 
-</​code>​ 
- 
-See the API documentation for further details about the optional parameters. 
- 
- 
-==== G-language Shell ==== 
- 
-For usage through G-language Shell, firstly download and install G-language Genome Analysis Environment v.1.8.9 or above following the instructions in: 
-http://​www.g-language.org/​wiki/​software 
- 
-Then, start the G-language Shell by typing **G** in your UNIX shell: 
- 
-<​code>​ 
-unix % G 
-</​code>​ 
- 
-Follow the instructions of the above Perl API once you enter the G-language Shell. ​ 
- 
-See the [[introductiontog-languageenglish#​g-language_shell|G-language System tutorial]] for more details about the G-language Shell. 
- 
-==== API Documentations ==== 
- 
-  * [[http://​rest.g-language.org/​help/​kmer_table|"​kmer_table"​ function documentation]] 
-  * [[http://​rest.g-language.org/​help/​cgr|"​cgr"​ function documentation]] 
-  * [[http://​rest.g-language.org/​help/​load|"​load"​ function documentation]] 
- 
- 
-  * [[rest|G-language REST API documentations]] 
cgr.txt ยท Last modified: 2014/01/18 07:44 (external edit)