Updates from February, 2012 Toggle Comment Threads | Keyboard Shortcuts

  • Mariam Rizkallah 7:45 pm on February 24, 2012 Permalink | Reply  

    Is gi_taxid mapper by NCBI really enough!? 

    At some point of my career, I have to deal with NCBI raw data. Fortunately, I started this as soon as possible. What I really wanted to do, is to filter the gi’s of nr that have taxid in the gi_taxid_protein mapper created by NCBI.

    I tried this:

    awk -F”\t” ‘BEGIN {while ( i = getline < “gi.list”) ar[$i] = $1;} {if ($1 in ar) print $0;}’ gi_taxid_prot.dmp > gi_taxid_prot.filtered

    However, the numbers really concern me:

    16828865 nr entries, 47308513 gi_taxid_prot pairs, 16807310 gi_taxid_prot.filtered pairs. How come that nr has 21,555 entries with no gi_taxid mapping?

    I am not sure about that and I don’t know whether I can figure it out or not.

    Advertisements
     
    • Mariam Rizkallah 8:52 pm on February 24, 2012 Permalink | Reply

      Update:
      I filtered them the other way around, I extracted the gi entries with no matches, and tried to find out about couple of them from NCBI. It turned out that those sequences has been updated and got themselves new gis that happen to be in the gi_taxid mapper. Will now I can see that the mapper is updated while nr is not? I don’t know.
      awk -F”\t” ‘BEGIN {while ( i = getline “filtered”;} else {print $0 > “no_match”;}}’ gi &

  • Mariam Rizkallah 8:20 pm on February 17, 2012 Permalink | Reply
    Tags: blast, math101   

    Practicing exponents (http://www.aaamath.com/dec71ix2.htm).. I am into BLAST and have to know what’s the difference between Expectation value 10 and 1E-10? I got an e-val of 1e-148, is it good or bad?

    Now, I recalled what we had back in college. 0.00000304 = 3.04*10-6 (6 places).

    I will translate E =10 in my head for now 😉 That makes 3.04E-6 an alternative to the previous solutions.

    That makes me sort BLAST results like this: 1e-148 is better or worse than 8e-119? 1*10-148 or 8*10-119? I guess it’s about the exponent rather than the number itself (the base..? Can’t remember?)

    And for BLAST e-val cutoffs:10 = 1*10^1 = 1e+1
    however 1e-5 = 1*10^-5 = 0.00001

    Never to forget that the smaller is the most significant

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel