Tagged: perl Toggle Comment Threads | Keyboard Shortcuts

  • Mariam Rizkallah 5:44 pm on November 19, 2011 Permalink | Reply
    Tags: bioinformatics, GC content, perl   

    Welcome back, Perl (GC content) 

    I coded in Perl for 1-2 weeks in my life 7 months ago, then shifted to Python –and PHP for sometime– for the previous 7 months. Now, I am back to Perl –somehow! Started by this GC content calculator.

    #! /usr/bin/perl -w

    use strict;

    #Hadling args
    #http://lowfatlinux.com/linux-perl-arguments.html
    #print “My name is $0 \n”; #Script name
    #print “First arg is: $ARGV[0] \n”;
    #$num = $#ARGV + 1; print “How many args? $num \n”;
    #print “The full argument string was: @ARGV \n”;

    #Calc GC contenct
    #http://www.cs.tut.fi/~jkorpela/perl/regexp.html
    #http://manuelcorpas.com/2010/02/03/a-script-to-calculate-gc-content/

    sub calcgc {
    my $seq = $_[0];
    my $count = 0;
    $count++ while ($seq =~ m/[GC]/gi); #contains, case-insens-
    my $num = $count / length($seq);
    #my ($dec) = $num =~ /(\S{0,6})/;
    my $dec = sprintf(“%.3f”, $num);
    return $dec;
    }

    #my $seq = “GCTTGTCGATATGACTACTTCGAGAAATTTATCTCGATTATCTGAGA”;
    #my $gc = calcgc($seq);
    #print(“$gc\n”);

    print(“Welcome back, Perl\n”);

    my $a = $ARGV[0];
    open(IN, $a) || die “cannot open $a for reading: $!”;
    while (<IN>) { # read a line from file $a into $_
    #print $_; # print that line to STDOUT
    if (/^>/) {
    print $_;
    }
    next if /^>/; #http://stackoverflow.com/questions/4119225/perl-regular-expressions-delete-line-if-it-starts-with
    #print $_;
    my $gc = calcgc($_);
    print(“$gc\n”);
    }
    close(IN);

     
  • Mariam Rizkallah 4:27 am on August 24, 2011 Permalink | Reply
    Tags: ctd, perl, shell   

    Perl vs shell scripting contest! 

    I have to write.. so I wouldn’t feel alone..

    Currently, we’re reading about the huge database “Comparative toxicogenomics database” (CTD). As an open source database, all tables are –kindly–available for download. We started to parse the comma-delimited tables whose fields are further delimited by “pipe”, where fields’ data may contain “,” also.

    As an expert in his field, our PI started to write a one-line program in perl to parse it, while an eager student like myself started to work using shell scripting. No need to say that our PI own the contest. But I want to share with you my trials.

    Dedicated to CTD!

    #first trial
    cut -f9 -d"," file.csv | sort | grep "\^" | cut -d"|" -f1,2 | uniq > file.txt #then I realized that there's >2 fields in delimited by "|", and "cut" is not really helping
    #successful trial
    awk -F"," '{for (i=1; i<NF; i++) {if ($i ~ /\^/) print $i;} }' file.csv | sort | uniq | awk -F"|" '{for (i=1; i<=NF; i++) print $i;}' | sort | uniq | sed -n 's/\^/ /gp' > file.txt
    
    #Update Sep 2, 2011
    #Parse the generated file:
    awk -F" " '{print $2;}' file.txt | sort | uniq | wc -l
    
    #Update Sep 3, 2011
    awk -F"," '{for (i=1; i<NF; i++) {if ($i ~ /\^/) print $i;} }' file.csv | sort | uniq | awk -F"|" '{for (i=1; i<=NF; i++) print $i;}' | sort | uniq > file.txt
    awk -F"^" '{print $2;}' file.txt | sort | uniq > file2.txt #Before replacement of "^", to preserve space-separated data
    awk '{print "("$i",",$i"),";}' file2.txt > file2_python_dict.dict #Wrong.. missing "'" #"," = space...
    awk '{print "("$i",",$i"),";}' file2.txt | sed -n "s/(/(\'/gp" | sed -n "s/)/\')/gp" | sed -n "s/,/\',/gp" | sed -n "s/)',/),/gp" | sed -n "s/ / '/gp" > python_dict.dict
    #There's a logical error -> '' before any space-containing line.
     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel