G-Links is a rapid data "broker" service that collects and adds related information to a given gene (or gene set).

With the availability of numerous curated databases, researchers are nowadays able to efficiently utilize the multitude of biological data by integrating these resources by hyperlinks and cross references. A large proportion of bioinformatics research tasks, however, is comprised of labor-intensive tasks in fetching, parsing, and merging of these datasets and functional annotations from dispersed databases and web-based services. Therefore, data integration is one of the key challenges of bioinformatics. We here present G-Links, a gateway server for querying and retrieving gene annotation data. The system supports rapid querying with numerous gene IDs from multiple databases or nucleotide/amino acid sequences, by internally centralizing gene annotations based on UniProt entries. This system therefore first converts the query into UniProt ID by ID conversion or by sequence similarity search, and returns related annotations and cross references. Moreover, users are able to run external web-based tools based on the query gene. G-Links is implemented as a RESTful service, so users can easily access this tool from any web browser. This service and documentations are freely available at

Base URL

Quick Start

Input sequence (Amino acid or Nucleotide) or Gene ID

Output Format :
E-value :
Identity :
Feeling lucky :


Usage (Syntax)

This section describes the G-Links URI syntax conventions: for usage examples, scroll below. G-Link is provided by REST interface. Database cross-references information related as given gene ID or sequence (nucleotide or amino acid) can be accessed through HTTP GET/POST request using unique URI.

REST URI conventions

URL Syntax of G-Links. G-Links is implemented as a RESTful service that can be queried by altering the URL.

HTML output

HTML output example of BRCA1_HUMAN (UniProt ID of BRCA1 gene in humans). By default, access to G-Links with web browsers display the results in interactive HTML, with related image gallery implemented with CoverFlow ( on the top, followed by a large table of annotations and cross-references.


Standard qualifiers
  • GENE
    • Sequence (nucleotide or amino acid)
    • Gene ID (Available ID list is here)
      • Note: NCBI Entrez Gene ID needs to be specified as "GeneID:¥d", since G-Links considers IDs in numbers-only as taxonomy ID.
    • NCBI tax ID (i.e. 9606)
    • RefSeq Genome ID (i.e. NC_000913)
Optional qualifiers

List of available databases (IDs)

Overview of supported databases and web services in G-Links:

Detailed list is as follows:

Usage Examples

Query URLs

Sample Scripts

    • Ruby script to get "DISEASE" and SNP (dbSNP & SNPedia) info about H.sapiens "cancer" genes which have "GOslim component" annotation related to "metabolic" in slimed TSV format.

Example of programatic access

One of the strength of G-Links is its programmatic access. For example, GO slim classification of all genes of E.coli for GO:Process ontology can be retrieved from the following URL:

This result is shown as a formatted HTML page when viewed in a browser, but when it is accessed from the command line or from programs, the result is automatically returned as TSV file. Using this, simple combination of UNIX commands can produce a classification summary of all genes in E.coli with GOslim:Process terms. Here is an example:

$ curl -v  |grep \# |cut -f 2,3 |grep GO: |sort |uniq -c |sort -rn

Here, G-Links is accessed from the command line, producing the result to standard output via "curl -v", and the sections containing GO terms and its descriptions are extracted ("|grep \# |cut -f 2,3 |grep GO:). Then, the terms are sorted and counted ("|sort |uniq -c"), and printed in a descending order ("|sort -rn").

Following is the output of the above line of commands:

1056 GO:0009058	biosynthetic process
1032 GO:0008150	biological_process
 860 GO:0034641	cellular nitrogen compound metabolic process
 636 GO:0044281	small molecule metabolic process
 526 GO:0006810	transport
 484 GO:0006950	response to stress
 381 GO:0005975	carbohydrate metabolic process
 374 GO:0009056	catabolic process
 285 GO:0055085	transmembrane transport
 273 GO:0006259	DNA metabolic process
 257 GO:0006520	cellular amino acid metabolic process
 190 GO:0051186	cofactor metabolic process
 169 GO:0006629	lipid metabolic process
 127 GO:0006464	cellular protein modification process
 127 GO:0006091	generation of precursor metabolites and energy
  98 GO:0006790	sulfur compound metabolic process
  96 GO:0042592	homeostatic process
  92 GO:0032196	transposition
  84 GO:0006399	tRNA metabolic process
  79 GO:0007165	signal transduction
  76 GO:0071554	cell wall organization or biogenesis
  72 GO:0022607	cellular component assembly
  63 GO:0006412	translation
  52 GO:0034655	nucleobase-containing compound catabolic process
  50 GO:0051301	cell division
  50 GO:0007155	cell adhesion
  50 GO:0006457	protein folding
  45 GO:0048870	cell motility
  43 GO:0006461	protein complex assembly
  39 GO:0007049	cell cycle
  37 GO:0040011	locomotion
  32 GO:0051604	protein maturation
  31 GO:0071941	nitrogen cycle metabolic process
  31 GO:0051276	chromosome organization
  21 GO:0061024	membrane organization
  19 GO:0019748	secondary metabolic process
  18 GO:0044403	symbiosis, encompassing mutualism through parasitism
  17 GO:0000003	reproduction
  15 GO:0022618	ribonucleoprotein complex assembly
  14 GO:0007059	chromosome segregation
  14 GO:0002376	immune system process
   9 GO:0006605	protein targeting
   7 GO:0008219	cell death
   5 GO:0042254	ribosome biogenesis
   4 GO:0006397	mRNA processing
   3 GO:0000902	cell morphogenesis
   2 GO:0048646	anatomical structure formation involved in morphogenesis
   2 GO:0030198	extracellular matrix organization
   2 GO:0030154	cell differentiation
   1 GO:0065003	macromolecular complex assembly
   1 GO:0015979	photosynthesis
   1 GO:0007267	cell-cell signaling
   1 GO:0007010	cytoskeleton organization



