User Tools

Site Tools


problem_11

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
problem_11 [2010/11/17 23:14]
ike
problem_11 [2014/01/18 07:44] (current)
Line 1: Line 1:
-====== ​Problem ​1-1 - Basic DNA sequence analysis (1) ======+====== ​Practice ​1-1 - Basic DNA sequence analysis (1) ======
  
-**Write a program that counts all the number of each base (A,T,G and C) from given specific area of genome. First, read the question carefully and think well about the meanings and solutions. Then design a scheme of the program.**+**Write a program that counts all the number of each base (A,T,G and C) from given specific area of genome. First, read the question carefully and think well about the meanings and solutions. Then design a scheme of the program.**
  
  
 ===== 1 Programing design ===== ===== 1 Programing design =====
 ==== 1.1 Overview ==== ==== 1.1 Overview ====
-By dividing the scheme of program into three parts, ​users can clearly see what to make.+By dividing the scheme of program into three parts, ​one can clearly see what to make.
  
   - Open the data file.   - Open the data file.
-  - Load the data file and count up bases.+  - Load the data file and count up the bases.
   - Analyze the data file and save it to a file.    - Analyze the data file and save it to a file. 
  
  
-Here, users need some basic understanding on programming Perl such as standard functions for data inputs and outputs. ​Syntaxes ​on Perl is different from some other languages so reference books can help users to follow the Perl programming procedures.+Here, users need some basic understanding on programming Perl such as standard functions for data inputs and outputs. ​Syntax ​on Perl is different from some other languages so reference books can help you to follow the Perl programming procedures.
  
- +Once steps and the logic for programming is ready, ​you can begin to write a scriptbut let’s prepare ​little more for better programming.
-Once steps and the logic for programming is ready, ​users can begin to write a script but no preparation can cause pile of errors. So before moving on to actual work, let’s prepare little more for better programming.+
  
  
 ==== 1.2 Preparation ==== ==== 1.2 Preparation ====
-Now we are going to think about a global ​preference ​of the script. The Perl language has high extensibility that enables users to process any kind of process. Therefore ​logical ​arrangements for the script can guide users to ease programming. ​+Now we are going to think about a global ​settings ​of the script. The Perl language has high extensibility that enables users to process any kind of processes. Therefore ​certain ​arrangements for the script can guide users to ease programming. ​
  
-A variable is a place where values are store in computer. Let’s think about variable ​first. In the Perl language users can ignore ​forms fro strings and numerals that is different from other major language ​such as C or Java. $var is used in the Perl to represent a string variable with “$” in the prefix of variable name. In this Problem ​1-1, users are going to count up four types bases so users need four types of variable ​to store the each value. Also, new other variable for keeping a DNA sequence will be helpful.+A variable is a place where values are stored ​in computer. Let’s think about variables ​first. In the Perl language users can ignore ​variable "​types"​ such as strings and numerals that are different from other popular languages ​such as C or Java. $var is used in Perl to represent a string variable with “$” in the prefix of variable name. In this Practice ​1-1, you are going to count up four types of bases so you will need four types of variables ​to store each value. Also, another ​variable for keeping a DNA sequence will be helpful.
  
-All together, five variables are fundamental for this program.+Altogether, five variables are fundamental for this program.
  
-In the Perl programming try to use “my” ​when declaring a variable. Declaration is not necessary in the Perl but declarations do make programming much more readable and as a result it leads users to comfortable programming experience. ​Scoop for each variable is an important factor in programming so take care for declaring variables within a loop process where variables ​come out to be relatively ​local variable.+In Perl programmingtry to use “my” ​whenever ​declaring a variable. Declaration is not necessary in Perlbut declarations do make programming much more readable and as a result it leads to more comfortable programming experience. ​Scope for each variable is an important factor in programming so take care when declaring variables within a loop process where variables ​become ​a local variable.
  
-For the last, put following ​single ​line in the top of script to abbreviate perl command in the shell when the protection mode is 755.+Lastly, put the following line at the top of script to abbreviate perl command in the shell when the file permission ​is 755.
  
   #​!/​usr/​local/​bin/​perl   #​!/​usr/​local/​bin/​perl
  
-This makes users to short cut some command ​from this +This makes running of the script shorter, ​from 
  
   % perl myprogram.pl   % perl myprogram.pl
  
-to this.+to 
  
   % myprogram.pl   % myprogram.pl
  
-From above preparation, basic script should look some like this.+From the above preparations, basic script should ​now look like this.
  
   #​!/​usr/​local/​bin/​perl   #​!/​usr/​local/​bin/​perl
Line 46: Line 45:
   my($seq, $A, $T, $G, $C);   my($seq, $A, $T, $G, $C);
  
-In the next section, let’s write an actual program. This time we will skip constructing subroutines since it is short simple script.===== 2 Open file =====+In the next section, let’s write an actual program. This time we will skip constructing subroutines since it is short and simple script. 
 + 
 +===== 2 Open file =====
  
 ==== 2.1 Filehandle ==== ==== 2.1 Filehandle ====
  
-The Perl has very powerful system ​on file handling. Inside the script users declare special variable called filehandle which is different from variables mentioned above. All the letters in the filehandle is written in upper case like FILEHANDLE for instance. So within the script users use not the filename but filehandle to cope with file input and output.+Perl has very powerful system ​for file handling. Inside the scriptusers declare special variable called ​the filehandle which is different from the normal ​variables mentioned above. All the letters in the filehandle is written in upper case like FILEHANDLE for instance. So within the script users do not use the filename ​directly, ​but use the filehandle to cope with file input and output.
  
  
Line 56: Line 57:
  
  
-When opening a file, users use open function and close function when closing a file. This open and close is common for file loading and outputting.+When opening a file, use the "open" ​function and "close" ​function when closing a file. This open and close is common for file loading and outputting.
  
   open(FILEHANDLE,​ "​filename"​);​   open(FILEHANDLE,​ "​filename"​);​
  
  
-Above is typical way of writing in the Perl programming. Set absolute path for a file to open then system will load the file. If users want to save or overwrite save a file then put “>” or “>>​” identifier in front of file name. For example if users want to save a data into “out.txt” then write a script like this.+Above isa  ​typical way of Perl programming. Set absolute path for a file to open, and the system will load the file. If users want to save or append ​a file then put “>” or “>>​” identifier in front of the file name. For example if you want to save a data into “out.txt” then write a script like this.
  
   open(OUT, ‘>​out.txt’);​   open(OUT, ‘>​out.txt’);​
  
  
-Metaphorically,​ filehandle is a pipe for a file which plays a role of interface between a file and a program. When handling a file user always use filehandle so that once a file is set to a filehandle there is no need to rewrite file name again. Moreover, filehandle ​is movable which means filehandle itself does not indicate whole data but in fact point to specific field of data so that when a file is loaded filehandle is on the top of line but as loop move onto next line filehandle moves along the loop. +Metaphorically,​ filehandle is a pipe for a file which plays a role of interface between a file and a program. When handling a file user always use filehandle so that once a file is set to a filehandle there is no need to rewrite file name again. Moreover, filehandle ​keeps track of the position within the given file, starting from the top line, progressing ​line-by-line as programming ​loop continues.
  
  
Line 73: Line 73:
  
  
-Adding some extra line to the script makes program much more friendly to users and computer. ​+Adding some extra line to the script makes the program much more friendly to users and computer. ​
  
   open(FILE, “my.seq”) || die(“ERROR:​ file does not exist\n”);​   open(FILE, “my.seq”) || die(“ERROR:​ file does not exist\n”);​
Line 79: Line 79:
  
  
-Remember this syntax as dignified rule which means when open function is failed prints ​an error ERROR: file does not exist” and quit the program if open function succeed then it prints ​“open data file my.seq”. STDERR is identifier for standard error output.+The above script returns ​an error message and quits when the program fails to open the specified file, as "ERROR: file does not exist”and continues by printing ​“open data file my.seq” ​upon success. STDERR is an identifier for standard error output.
  
  
Line 85: Line 85:
  
  
-Above syntax closes a file. Remember to close the filehandle when file handling is finished. Once filehandle is closed ​same filehandle name can be used in the script.+The following ​syntax closes a file. Remember to close the filehandle when file handling is finished. Once filehandle is closed, the filehandle name can be reused for other files.
  
   close(FILEHANDLE);​   close(FILEHANDLE);​
Line 93: Line 93:
 ===== 3 Data analysis ===== ===== 3 Data analysis =====
  
-Before moving on to the commentary section, program it self is simple so think logically and try to write a script with your own.  +The following are a few hints to the practice
-==== 3.1 Use while statement ====+
  
-The logic of this program is to load the data and calculate ​the number of bases. This section will provide a clue for loading ​a file and its Perl functionIt does depend on the function user select but basic idea on loading a file is to reads a single line of it in each loop. While statement ​is function in the Perl to manage ​some basic loop process that reads a file from top to bottom.+==== 3.1 Using the "​while"​ statement ==== 
 + 
 +The logic of this program is to load the data and count the number of bases. This section will provide a clue for reading ​a file in Perl. The "​while" ​statement ​provides ​way to manage basic loop process that reads a file from top to bottom.
  
  
Line 106: Line 107:
  
  
-Above code is syntax for while statement in the Perl. A filehandle covered with brackets “<>​” reads each line of a file. Brackets returns 1 if there is any line and 0 if it reaches the end of file, and finishes the process of while statement.+Above code is the syntax for "while" ​statement in Perl. A filehandle covered with brackets “<>​” reads each line of a file. Brackets returns 1 if there is any line and 0 if it reaches the end of file, and finishes the process of "while" ​statement.
  
-In each process of while statement, loaded line is stored into special variable $_ so use this variable ​to calculate ​the process.+In each process of "while" ​statement, loaded line is stored into special variable $_ so use this variable ​within ​the loop for necessary procedures.
  
 In the while statement, variables within are initialized in each loop process. Therefore, users need to declare each variable before the while statement to escape and avoid initialization problem. ​ In the while statement, variables within are initialized in each loop process. Therefore, users need to declare each variable before the while statement to escape and avoid initialization problem. ​
 +
 ==== 3.2 Store the DNA sequence ==== ==== 3.2 Store the DNA sequence ====
  
Line 116: Line 118:
 Before calculating the number of bases, read the file and store the DNA sequence into a variable. Declare a variable $seq before while statement and join each sequence by each loop. Make sure when storing a new sequence into to the previous sequence remove linefeed code in the end of every line. Before calculating the number of bases, read the file and store the DNA sequence into a variable. Declare a variable $seq before while statement and join each sequence by each loop. Make sure when storing a new sequence into to the previous sequence remove linefeed code in the end of every line.
  
-To remove linefeed code, select ​replace function s in the Perl. Replacement process is one of the powerful ​points ​in the Perl that replaces ​character ​into another ​character which is attached ​by slashes. For example, when replacing linefeed code into ENTER then code will be like this.+To remove linefeed code, use replace function ​"s///" ​in  Perl. Replacement process is one of the most powerful ​functionality ​in Perl that replaces ​characters ​into another, separated ​by slashes. For example, when replacing linefeed code into ENTER then code will be like this.
  
   $seq =~ s/​\n/​ENTER/;​   $seq =~ s/​\n/​ENTER/;​
  
- +Put a switch g so that replacement goes through ​all instances of "​¥n"​.
-Put a switch g so that replacement goes through ​whole sequence in the loop process.+
  
   $seq =~ s/​\n/​ENTER/​g;​   $seq =~ s/​\n/​ENTER/​g;​
  
 +Above code will replace all linefeed code into ENTER.
  
-Above code will replace whole linefeed code into ENTER. +As advanced comment, function s//// can use regular expression for pattern matching. So to remove a character that is NOT lower case as an example, it can be like this.
- +
-As advanced comment, function s//// can use regular expression for pattern matching. So to remove a character that is NOT lower case as an example, it can be as this. +
  
   $seq =~ s/​[^a-z]//​g;​   $seq =~ s/​[^a-z]//​g;​
  
 +Regular expression is useful in Perl so if users have any interest we recommend learning further.
  
-Regular expression is a useful ​in the Perl so if users have any interest we recommend learning it further. +Last tip in this section is the joining of two strings. In Perluse .= operator to append a string to another.
- +
-Last for this section is to join two strings. In the Perl use .= operator to carry it out.+
  
   $seq1 .= $seq2;   $seq1 .= $seq2;
-==== 3.3 Count up bases ==== 
  
-Users can count each base (A, T, G and C) in the C Programming language-like way.+==== 3.3 Counting the bases ==== 
 + 
 +You can count each base (A, T, G and C) in the C Programming language-like way.
  
  
Line 149: Line 148:
  
  
-Users are learning ​the Perl, so let’s write script ​in the Perlish way! +However, since you are learning Perl, let’s write it in more Perlish way! 
-In the Perl, there is similar ​replacement function ​as s/ / / which is tr/ / / for single character replacement. This function returns the number of conducted replacement. For example, if users replaced “a” into “t”, beneath code does the same process. +In Perl, there is another ​replacement function ​"tr/ / /" ​for single character replacement. This function returns the number of conducted replacement. For example, if users replaced “a” into “t”, beneath code does the same process.
  
   $seq =~ s/a/t/g;   $seq =~ s/a/t/g;
   $seq =~ tr/a/t/;   $seq =~ tr/a/t/;
  
-But, as mentioned tr/ / / returns the replaced number so changing code into following will do nice work.+But, as mentioned tr/ / / returns the replaced number so changing ​the code into following will do nice work.
  
   $count = $seq =~ tr/a/t/;   $count = $seq =~ tr/a/t/;
Line 170: Line 168:
  
  
-The framework to solve Problem ​1-1 should look like this.+The framework to solve Practice ​1-1 should look like this.
  
  
problem_11.txt · Last modified: 2014/01/18 07:44 (external edit)