User Tools

Site Tools


This is an old revision of the document!

Problem 2-0 & 2-1 – CDS search and translation (1)

[Problem 2-0]

Define function trans-codon in the Perl that exchanges three sets of nucleotides into an amino acid according to codon table.


[Problem 2-1]

Write a script that loads a genome and switch codon into an amino acid.

Overview for two problems

We expect users to address on two aspects through this comprehension.

  1. Try to estimate coding regions from given sequence
  2. Translate codon into amino acids in the estimated coding region

In the Problem 2-1, users are to implement program on essential parts of translating a codon into an amino acid.

Codon and amino acid translation

Basically, one gene represents one protein in central dogma. Now, how does life emerge a protein from a gene?

A protein is a long chain of amino acid and each of it is composed of 20 types of amino acid which is call polypeptide. Features of amino acid are determined by order of amino acids. Therefore users can attain sequence of protein from the orders of amino acids and amino acid is characterized by three sets of nucleotides in DNA. These three sets of nucleotide is call codon. For example length of 300 nucleotides express a protein of 100 amino acid lengths long.

Taken all together, one unit of codon is equivalent to one unit of amino acid. Sequences of amino acid unit, or in other words codon unit compose a gene or protein.

Problem 2-0

Now, let’s translate given DNA sequence into protein sequence.

First task is to define a function that uniquely translates codon into amino acid in the Perl.

Look at the codon table in a biological textbook that best describe your target species. For example search for amino acid that correspond with DNA sequence “ctt”; in this case, protein called leucine (Leu).

Above example only describe one set of codon. Now let’s define a subroutine that works for all 64 patterns of codon.

sub trans_codon () {


To call the subroutine, put a “&” in the head of subroutine name like in following way.


A codon “ctt” corresponds to leucine, so character “L” that represent leucine is expected to be assigned into variable $amino_acid in sample code. For next step, it would be nice to acquire amino acids from any DNA sequence as an argument such as an argument “atgcttctggtg” returning amino acids “MLLV”. Condon table is easily described by using hash like in following way.

my %CodonTable = (
                    'ctt', 'L',  'cct', 'P',  'cat', 'H',  'cgt', 'R',
                    'ctc', 'L',  'ccc', 'P',  'cac', 'H',  'cgc', 'R',

Therefore sample code bellow


easily assign amino acid from given codon which composed of three sets of nucleotide.

Next is to cut sequence into pieces of three nucleotides long by using for statement.

sub trans_codon () {
      my $nucleotides = shift;  # Assign loaded sequence into variable $nucleotides
      my $amino = '';
      my %CodonTable = (
                        'ctt', 'L',  'cct', 'P',  'cat', 'H',  'cgt', 'R',
                        'ctc', 'L',  'ccc', 'P',  'cac', 'H',  'cgc', 'R',

      for (?????) {
                ?????;  # Split a sequence into three nucleotides
                ?????;  # Translate codons into amino acids
                ?????;  # Join amino acid into $amino
      return $amino;

Problem 2-1

The translator is ready so let’s make a script that loads a genome and switch codon into an amino acid by following process.

  1. Load target DNA sequence by the script made in the Problem 1 and assign into variable $seq
  2. Translate $seq into amino acid by function trans_codon()
  3. Print the result!!

Refine Problem 2-1

Here, we will provide some advanced Perl technique to refine Problems 2-0 and 2-1.

There were few lines of for statement in the process of subroutine trans_codon which is highly redundant. Users can rewrite and combine these processes into a single line like in following way by saving up variables.

$amino .= $CodonTable{substr($seq, $i, 3)};

For statement also can be rewritten as following.

for(?????){ $amino .= $CodonTable{substr($seq, $i, 3)};}
$amino .= $CodonTable{substr($seq, $i, 3)} for ?????;
problem_2a.1290242225.txt.gz · Last modified: 2014/01/18 07:44 (external edit)