Regular expression with perl
From Biocourse
Perl Regular Expressions
Regular expressions are useful to find known motifs within genetic sequences.Here are the useful regular expression collections for bioinformatics.
Regular expressions can be used to string comparisons, selections, and replacements with actual cases.
1. String Comparisons
(Case: find AUG codon)$string =~ m/aug/;
If you want only those strings where the sought text appears at the very beginning.
(Case: find AUG codon at the very beginning of the sequence)
$string =~ m/^aug/;
expansions
. Match any character
\w Match "word" character (alphanumeric plus "_")
\W Match non-word character
\s Match whitespace character
\S Match non-whitespace character
\d Match digit character
\D Match non-digit character
\t Match tab
\n Match newline
\r Match return
\f Match formfeed
\a Match alarm (bell, beep, etc)
\e Match escape
\021 Match octal char ( in this case 21 octal)
\xf0 Match hex char ( in this case f0 hexidecimal)
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
$line =~ /.{28}(\d\d)-(\d\d)-(\d\d).{8}(.+)$/
my($filename) = $4;
my($yymmdd) = "$3$1$2";
$1, $2, $3 and $4 are the scalers inside the first, second, third and fourth parenthesis sets
3. String Substitutions
(Case:DNA->RNA)
$string =~ s/t/u/;
(Case:AUG codon->start)
$string =~ s/aug/start/;
Translations are one form of the substitutions, except they happen on a letter by letter basis.
(Case:DNA with capital-> DNA with lower case)
$string =~ tr/[A-Z]/[a-z]/;
(Case:DNA->RNA)
$string =~ tr/[t]/[u]/;
(Case:DNA->complementary DNA)
$string =~ tr/[a, t, g, c]/[t,a,c,g]/;
(Case:DNA sequence-> number)
$string =~ tr/[a,t,g,c]/[1,2,3,4]/;
4. Other cases(pseudo codes)
(Case:check sequences that they are consisted of a,t,g,c only)
reference : http://www.troubleshooters.com/codecorn/littperl/perlreg.htm#SimpleStringComparisons
