User Tools

Site Tools


problem_12

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
problem_12 [2010/11/17 23:26]
ike
problem_12 [2014/01/18 07:44] (current)
Line 1: Line 1:
-====== ​Problem ​1-2 - Basic DNA sequence analysis (2) ====== +====== ​Practice ​1-2 - Basic DNA sequence analysis (2) ====== 
-**Refine previous program that enables to load whole genome sequence of M. genitalium and compare the sequential ​difference (bias) ​through nucleotide usage.+**Refine previous program that enables to load whole genome sequence of //M. genitalium// and compare the compositional ​difference (bias).
  
-Problems ​1-1 and 1-2 are similar major difference comes from the type of DNA sequence. In Problem 1-1 data are already ​prepare ​by cutting and pasting target sequence from data, but in Problem ​1-2, users need to acquire ​DAN sequence from genome data format which includes ​comments ​and other information. If the DNA sequence is ready in both case count up process is same.+Practices ​1-1 and 1-2 are similar, and the major difference comes from the type of DNA sequence ​format. In Problem 1-1 data are already ​prepared ​by cutting and pasting target sequence from data, but in Practice ​1-2, users need to acquire ​the DNA sequence from genome data format which includes ​annotations ​and other information. If the DNA sequence is parsed out, counting can be done in the same way as done in 1-1.
  
-Again, read the passage and think logically, try to think about meanings ​and procedures. Then make a basic design of program.**+Again, read the passage and think logically, try to think about the meanings ​of every procedure. Then make a basic design of program.**
  
 ===== 1 Programming design ===== ===== 1 Programming design =====
 ==== 1.1 Improve data loading ==== ==== 1.1 Improve data loading ====
-The script for the Problem ​1-1 looks like this.+The script for the Practice ​1-1 looks like this.
  
   #​!/​usr/​local/​bin/​perl   #​!/​usr/​local/​bin/​perl
Line 27: Line 27:
  
  
-Look at the genome data to see what needs to be done for acquiring ​DAN sequence.+Look at the genome data to see what needs to be done for acquiring ​DNA sequence.
  
  
Line 43: Line 43:
   //   //
  
-Firstly, there are comments such as “LOCUS”,​ “DEFINITION” and “ACCESSION” in the front of data. Target DNA sequence is in the last part of the file and one line above the sequence is comment ​“ORIGIN”. So, skip until the comment comes out. Nextlook at the very last line of the sequence. There are two slashes ​which indicates end of the DNA sequence.+Firstly, there are comments such as “LOCUS”,​ “DEFINITION” and “ACCESSION” in the beginning ​of data. Target DNA sequence is in the last part of the file and one line above the sequence is the tag “ORIGIN”. So, the strategy will be to skip until the comment comes out, and read the sequences until the very last line of the data marked with two slashes.
  
- +From these, loading data in the program can be improved to the following:
-From these points of view, loading data in the program can be improved to following ​way.+
  
  
Line 76: Line 75:
 ==== 2.1 Decimal calculation ==== ==== 2.1 Decimal calculation ====
  
-In the Problem ​1-2, comparison of result ​in the Problem ​1-1 is required. ​Users are comparing two sets of unequal ​population so consider outputting ​result ​in percentages ​this time calculating ​up to the second decimal place will be enough for validity.+In the Practice ​1-2, comparison of result ​with that of Practice ​1-1 is required. ​Sine here one would be comparing two sets of unequal ​populations, ​consider outputting ​results ​in percentages ​of up to the second decimal place.
  
 If users are to count adenine for instance then the script should look like this. If users are to count adenine for instance then the script should look like this.
Line 82: Line 81:
   $percent=($A/​length($seq))*100;​   $percent=($A/​length($seq))*100;​
  
-length() function returns the length of a variable. If the variable was string then it returns the number of character ​so that length() function for $seq returns number of nucleotide in M. genitalium.+length() function returns the length of a variable. If the variable was string then it returns the number of characters ​so that length() function for $seq returns number of nucleotide in //M. genitalium//. 
 ==== 2.2 Output of decimals ==== ==== 2.2 Output of decimals ====
  
-Outputs in the Perl is fixable ​by printf() function like in the C. So let’s cut up to the second decimal places.+Outputs in Perl can be formatted ​by printf() function like in C. So let’s cut up to the second decimal places.
  
   printf("​A: ​  ​%.2f\n",​ $percent);   printf("​A: ​  ​%.2f\n",​ $percent);
Line 93: Line 93:
  
 ===== 3 Advanced: Combine two script into a single process ===== ===== 3 Advanced: Combine two script into a single process =====
-User has made two programs one from Problem ​1-1 and the other from Problem ​1-2. These two programs are very much similar so as an advanced problem let’s combine two into one. Some ideas to shape up implement are to ask users that sequence to compare if any and if nothing is typed then default sequence were used to analyze.+You have now made two programsone from Practice ​1-1 and the another ​from Practice ​1-2. These two programs are very much similar so as an advanced problemlet’s combine ​the two into one. Some ideas to improve the program may be to make it prompt for the input file name. In this way, this script can be a generic program for calculating the nucleotide composition
  
problem_12.txt · Last modified: 2014/01/18 07:44 (external edit)