- G-language Maps
Kazuharu Arakawa, Ph.D.
G-language Project Leader
Institute for Advanced Biosciences
Kazuharu Arakawa, Ph.D.
G-language Project Leader
Institute for Advanced Biosciences
This is an old revision of the document!
G-Links is a rapid data "broker" service that collects and adds related information to a given gene (or gene set).
With the availability of numerous curated databases, researchers are nowadays able to efficiently utilize the multitude of biological data by integrating these resources by hyperlinks and cross references. A large proportion of bioinformatics research tasks, however, is comprised of labor-intensive tasks in fetching, parsing, and merging of these datasets and functional annotations from dispersed databases and web-based services. Therefore, data integration is one of the key challenges of bioinformatics. We here present G-Links, a gateway server for querying and retrieving gene annotation data. The system supports rapid querying with numerous gene IDs from multiple databases or nucleotide/amino acid sequences, by internally centralizing gene annotations based on UniProt entries. This system therefore first converts the query into UniProt ID by ID conversion or by sequence similarity search, and returns related annotations and cross references. Moreover, users are able to run external web-based tools based on the query gene. G-Links is implemented as a RESTful service, so users can easily access this tool from any web browser. This service and documentations are freely available at http://link.g-language.org/.
This section describes the G-Links URI syntax conventions: for usage examples, scroll below. G-Link is provided by REST interface. Database cross-references information related as given gene ID or sequence (nucleotide or amino acid) can be accessed through HTTP GET/POST request using unique URI.
URL Syntax of G-Links. G-Links is implemented as a RESTful service that can be queried by altering the URL.
HTML output example of BRCA1_HUMAN (UniProt ID of BRCA1 gene in humans). By default, access to G-Links with web browsers display the results in interactive HTML, with related image gallery implemented with CoverFlow (http://imageflow.finnrudolph.de/) on the top, followed by a large table of annotations and cross-references.
Detailed list is as follows:
One of the strength of G-Links is its programmatic access. For example, GO slim classification of all genes of E.coli for GO:Process ontology can be retrieved from the following URL:
This result is shown as a formatted HTML page when viewed in a browser, but when it is accessed from the command line or from programs, the result is automatically returned as TSV file. Using this, simple combination of UNIX commands can produce a classification summary of all genes in E.coli with GOslim:Process terms. Here is an example:
$ curl -v http://link.g-language.org/NC_000913/extract=GOslim_process |grep \# |cut -f 2,3 |grep GO: |sort |uniq -c |sort -rn
Here, G-Links is accessed from the command line, producing the result to standard output via "curl -v", and the sections containing GO terms and its descriptions are extracted ("|grep \# |cut -f 2,3 |grep GO:). Then, the terms are sorted and counted ("|sort |uniq -c"), and printed in a descending order ("|sort -rn").
Following is the output of the above line of commands:
1056 GO:0009058 biosynthetic process 1032 GO:0008150 biological_process 860 GO:0034641 cellular nitrogen compound metabolic process 636 GO:0044281 small molecule metabolic process 526 GO:0006810 transport 484 GO:0006950 response to stress 381 GO:0005975 carbohydrate metabolic process 374 GO:0009056 catabolic process 285 GO:0055085 transmembrane transport 273 GO:0006259 DNA metabolic process 257 GO:0006520 cellular amino acid metabolic process 190 GO:0051186 cofactor metabolic process 169 GO:0006629 lipid metabolic process 127 GO:0006464 cellular protein modification process 127 GO:0006091 generation of precursor metabolites and energy 98 GO:0006790 sulfur compound metabolic process 96 GO:0042592 homeostatic process 92 GO:0032196 transposition 84 GO:0006399 tRNA metabolic process 79 GO:0007165 signal transduction 76 GO:0071554 cell wall organization or biogenesis 72 GO:0022607 cellular component assembly 63 GO:0006412 translation 52 GO:0034655 nucleobase-containing compound catabolic process 50 GO:0051301 cell division 50 GO:0007155 cell adhesion 50 GO:0006457 protein folding 45 GO:0048870 cell motility 43 GO:0006461 protein complex assembly 39 GO:0007049 cell cycle 37 GO:0040011 locomotion 32 GO:0051604 protein maturation 31 GO:0071941 nitrogen cycle metabolic process 31 GO:0051276 chromosome organization 21 GO:0061024 membrane organization 19 GO:0019748 secondary metabolic process 18 GO:0044403 symbiosis, encompassing mutualism through parasitism 17 GO:0000003 reproduction 15 GO:0022618 ribonucleoprotein complex assembly 14 GO:0007059 chromosome segregation 14 GO:0002376 immune system process 9 GO:0006605 protein targeting 7 GO:0008219 cell death 5 GO:0042254 ribosome biogenesis 4 GO:0006397 mRNA processing 3 GO:0000902 cell morphogenesis 2 GO:0048646 anatomical structure formation involved in morphogenesis 2 GO:0030198 extracellular matrix organization 2 GO:0030154 cell differentiation 1 GO:0065003 macromolecular complex assembly 1 GO:0015979 photosynthesis 1 GO:0007267 cell-cell signaling 1 GO:0007010 cytoskeleton organization
If you have a specific set of genes, such as RECA_ECOLI,RUVB_ECOLI,LEXA_ECOLI,UMUD_ECOLI, that may be over represented in a microarray experiment, running the same routine with this list of genes can produce the Gene Ontology classification of these genes of interest.
$ curl -v http://link.g-language.org/RECA_ECOLI,RUVB_ECOLI,LEXA_ECOLI,UMUD_ECOLI/extract=GOslim_process |grep \# |cut -f 2,3 |grep GO: |sort |uniq -c |sort -rn
This will produce:
4 GO:0006950 response to stress 4 GO:0006259 DNA metabolic process 3 GO:0008150 biological_process 2 GO:0009058 biosynthetic process 1 GO:0051276 chromosome organization 1 GO:0048870 cell motility 1 GO:0034641 cellular nitrogen compound metabolic process
Now these values are readily used to test its enrichment by Fisher's exact test, for example, to calculate Gene Ontology enrichment scores.
If alternative classification is desirable, simply change the extracting term from GOslim_process to, for example, KEGG BRITE hierarchy.
$ curl -v http://link.g-language.org/NC_000913/extract=KEGG_Brite |grep \# |cut -f 2,3 |grep ko |sort |uniq -c |sort -rn
This will produce:
1452 ko00001 KEGG Orthology (KO) 1017 ko01000 Enzymes 838 ko00002 KEGG pathway modules 358 ko01000 Enzymes 282 ko02000 Transporters 197 ko02000 Transporters 129 ko03000 Transcription factors 89 ko03400 DNA repair and recombination proteins 84 ko03016 Transfer RNA biogenesis 65 ko01002 Peptidases 61 ko02035 Bacterial motility proteins 58 ko02022 Two-component system 57 ko03011 Ribosome 56 ko03011 M00178 Ribosome, bacteria 52 ko02044 Secretion system 49 ko03009 Ribosome biogenesis 45 ko01007 Amino acid related enzymes 44 ko00002 KEGG pathway modules 39 ko01005 Lipopolysaccharide biosynthesis proteins 33 ko01001 Protein kinases 31 ko03011 M00179 Ribosome, archaea 28 ko03036 Chromosome 27 ko03110 Chaperones and folding catalysts 27 ko01003 Glycosyltransferases 26 ko03036 Chromosome 26 ko03032 DNA replication proteins 25 ko02044 Secretion system 20 ko03110 Chaperones and folding catalysts 20 ko01004 Lipid biosynthesis proteins 19 ko03009 Ribosome biogenesis 15 ko03012 Translation factors 13 ko02044 M00331 Type II general secretion system 12 ko02044 M00335 Sec (secretion) system 11 ko02000 M00240 Iron complex transport system 11 ko01002 Peptidases 10 ko03021 Transcription machinery 10 ko03021 Transcription machinery 10 ko02035 Bacterial motility proteins 10 ko02000 M00324 Dipeptide transport system 9 ko03400 M00260 DNA polymerase III complex, bacteria 9 ko03032 M00260 DNA polymerase III complex, bacteria 9 ko02000 M00306 PTS system, fructose-specific II-like component 8 ko03400 DNA repair and recombination proteins 8 ko03032 DNA replication proteins 8 ko03000 Transcription factors 8 ko00194 Photosynthesis proteins 7 ko02000 M00221 Putative simple sugar transport system 7 ko01006 Prenyltransferases 6 ko02000 M00439 Oligopeptide transport system 6 ko02000 M00239 Peptides/nickel transport system 6 ko02000 M00237 Branched-chain amino acid transport system 6 ko01005 Lipopolysaccharide biosynthesis proteins 5 ko03016 Transfer RNA biogenesis 5 ko03012 Translation factors 5 ko02000 M00440 Nickel transport system 5 ko02000 M00279 PTS system, galactitol-specific II component 5 ko02000 M00229 Arginine transport system 5 ko02000 M00185 Sulfate transport system 4 ko03400 M00183 RNA polymerase, bacteria 4 ko03021 M00183 RNA polymerase, bacteria 4 ko02044 M00336 Twin-arginine translocation (Tat) system 4 ko02022 Two-component system 4 ko02000 M00349 Microcin C transport system 4 ko02000 M00348 Glutathione transport system 4 ko02000 M00300 Putrescine transport system 4 ko02000 M00299 Spermidine/putrescine transport system 4 ko02000 M00283 PTS system, ascorbate-specific II component 4 ko02000 M00238 D-Methionine transport system 4 ko02000 M00230 Glutamate/aspartate transport system 4 ko02000 M00226 Histidine transport system 4 ko02000 M00225 Lysine/arginine/ornithine transport system 4 ko02000 M00222 Phosphate transport system 4 ko02000 M00219 AI-2 transport system 4 ko02000 M00209 Osmoprotectant transport system 4 ko02000 M00198 Putative sn-glycerol-phosphate transport system 4 ko02000 M00197 Putative fructooligosaccharide transport system 4 ko02000 M00194 Maltose/maltodextrin transport system 4 ko02000 M00193 Putative spermidine/putrescine transport system 4 ko02000 M00189 Molybdate transport system 3 ko04812 Cytoskeleton proteins 3 ko02035 M00506 CheA-CheYBV (chemotaxis) two-component regulatory system 3 ko02030 M00506 CheA-CheYBV (chemotaxis) two-component regulatory system 3 ko02022 M00506 CheA-CheYBV (chemotaxis) two-component regulatory system 3 ko02022 M00474 RcsC-RcsD-RcsB (capsule synthesis) two-component regulatory system 3 ko02001 Solute carrier family 3 ko02000 M00436 Sulfonate transport system 3 ko02000 M00435 Taurine transport system 3 ko02000 M00320 Lipopolysaccharide export system 3 ko02000 M00287 PTS system, galactosamine-specific II component 3 ko02000 M00280 PTS system, glucitol/sorbitol-specific II component 3 ko02000 M00276 PTS system, mannose-specific II component 3 ko02000 M00275 PTS system, cellobiose-specific II component 3 ko02000 M00274 PTS system, mannitol-specific II component 3 ko02000 M00259 Heme transport system 3 ko02000 M00255 Lipoprotein-releasing system 3 ko02000 M00254 ABC-2 type transport system 3 ko02000 M00248 Putative antibiotic transport system 3 ko02000 M00242 Zinc transport system 3 ko02000 M00241 Vitamin B12 transport system 3 ko02000 M00234 Cystine transport system 3 ko02000 M00232 General L-amino acid transport system 3 ko02000 M00227 Glutamine transport system 3 ko02000 M00217 D-Allose transport system 3 ko02000 M00215 D-Xylose transport system 3 ko02000 M00214 Methyl-galactoside transport system 3 ko02000 M00213 L-Arabinose transport system 3 ko02000 M00212 Ribose transport system 3 ko02000 M00210 Putative ABC transport system 3 ko02000 M00208 Glycine betaine/proline transport system 3 ko02000 M00207 Putative multiple sugar transport system 3 ko02000 M00192 Putative thiamine transport system 3 ko02000 M00191 Thiamine transport system 3 ko01008 Polyketide biosynthesis proteins 2 ko04040 Ion channels 2 ko02044 M00429 Competence-related DNA transformation transporter 2 ko02042 Bacterial toxins 2 ko02022 M00502 GlrK-GlrR (amino sugar metabolism) two-component regulatory system 2 ko02022 M00500 AtoS-AtoC (complexed poly-(R)-3-hydroxybutyrate biosynthesis) two-component regulatory system 2 ko02022 M00499 HydH-HydG (metal tolerance) two-component regulatory system 2 ko02022 M00497 GlnL-GlnG (nitrogen regulation) two-component regulatory system 2 ko02022 M00488 DcuS-DcuR (aerobic C4-dicarboxylate metabolism) two-component regulatory system 2 ko02022 M00486 CitA-CitB (citrate fermentation) two-component regulatory system 2 ko02022 M00477 EvgS-EvgA (acid and drug tolerance) two-component regulatory system 2 ko02022 M00475 BarA-UvrY (central carbon metabolism) two-component regulatory system 2 ko02022 M00473 UhpB-UhpA (hexose phosphates uptake) two-component regulatory system 2 ko02022 M00472 NarQ-NarP (nitrate respiration) two-component regulatory system 2 ko02022 M00471 NarX-NarL (nitrate respiration) two-component regulatory system 2 ko02022 M00456 ArcB-ArcA (anoxic redox control) two-component regulatory system 2 ko02022 M00455 TorS-TorR (trimethylamine N-oxide respiration) two-component regulatory system 2 ko02022 M00454 KdpD-KdpE (potassium transport) two-component regulatory system 2 ko02022 M00453 QseC-QseB (quorum sensing) two-component regulatory system 2 ko02022 M00452 CusS-CusR (copper tolerance) two-component regulatory system 2 ko02022 M00451 BasS-BasR (antimicrobial peptide resistance) two-component regulatory system 2 ko02022 M00450 BaeS-BaeR (envelope stress response) two-component regulatory system 2 ko02022 M00449 CreC-CreB (phosphate regulation) two-component regulatory system 2 ko02022 M00447 CpxA-CpxR (envelope stress response) two-component regulatory system 2 ko02022 M00446 RstB-RstA two-component regulatory system 2 ko02022 M00445 EnvZ-OmpR (osmotic stress response) two-component regulatory system 2 ko02022 M00444 PhoQ-PhoP (magnesium transport) two-component regulatory system 2 ko02022 M00434 PhoR-PhoB (phosphate starvation response) two-component regulatory system 2 ko02000 M00303 PTS system, N-acetylmuramic acid-specific II component 2 ko02000 M00272 PTS system, arbutin-, cellobiose-, and salicin-specific II component 2 ko02000 M00270 PTS system, trehalose-specific II component 2 ko02000 M00266 PTS system, maltose and glucose-specific II component 2 ko02000 M00265 PTS system, glucose-specific II component 2 ko02000 M00258 Putative ABC transport system 2 ko02000 M00256 Cell division transport system 2 ko02000 M00224 Putative phosphonate transport system 2 ko02000 M00223 Phosphonate transport system 2 ko02000 M00211 Putative ABC transport system 1 ko04121 Ubiquitin system 1 ko04090 Cellular antigens 1 ko03051 Proteasome 1 ko02044 M00571 AlgE-type Mannuronan C-5-Epimerase transport system 1 ko02044 M00339 RaxAB-RaxC type I secretion system 1 ko02044 M00326 RTX toxin transport system 1 ko02000 M00491 Putative arabinogalactan oligomer transport system 1 ko02000 M00325 alpha-Hemolysin/cyclolysin transport system 1 ko02000 M00305 PTS system, 2-O-A-mannosyl-D-glycerate-specific II component 1 ko02000 M00277 PTS system, N-acetylgalactosamine-specific II component 1 ko02000 M00273 PTS system, fructose-specific II component 1 ko02000 M00271 PTS system, beta-glucosides-specific II component 1 ko02000 M00268 PTS system, arbutin-like II component 1 ko02000 M00267 PTS system, N-acetylglucosamine-specific II component 1 ko02000 M00190 Iron(III) transport system 1 ko00194 Photosynthesis proteins