Updates from April, 2012 Toggle Comment Threads | Keyboard Shortcuts

  • Mariam Rizkallah 3:58 pm on April 16, 2012 Permalink | Reply
    Tags: , entrez, ncbi   

    The community: BioPython overcomes limitations 

    BioPython community strikes again!

    Peter Cock (@pjacock) and the BioPython community (biopython[AT]lists[DOT]open-bio[DOT]org) helped me fixing record retrieval script from NCBI.

    My “current” question was to get ESTs from nucest (“The collection of short single-read transcript sequences from GenBank”). I had one thought, how to get this (http://www.ncbi.nlm.nih.gov/nucest/?term=txid543769%5BOrganism:exp%5D) on my HDD!? I used BioPython Tutorial example to implement “History” feature of Entrez Utils to get them downloaded. I noticed that the output number of sequences is not consistent and that the error returns in XML tags within the output!

    Then, I decided to send the whole list of ids via efetch, but I faced the 10,000-record limitation. Efetch is amazing, got me very consistent results, no XML error tags..

    I tried to split the list into 10,000-id long lists, but wasn’t successful.. the same happened, I got only the first 10,000 records.

    I emailed the gang, and Peter reminded me with the good old days (to map gi_taxid, loop over the taxids to get all child nodes, and get them from the 11G-EST fasta).

    Peter went through my script and had an idea to send specific gis for the batches by pointing at gi_list indexes. He sent them using epost, then catched the WebEnv variable and sent it back through efetch. It’s perfect! I do appreciate.

    import sys
    from Bio import Entrez
    Entrez.email = “”
    txid = int(sys.argv[1])
    #get count
    prim_handle = Entrez.esearch(db=”nucest”,term=”txid%i[Organism:exp]” %(txid), retmax=20)
    prim_record = Entrez.read(prim_handle)
    prim_count = prim_record[‘Count’]
    #get max using history (Biopython tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc119)
    search_handle = Entrez.esearch(db=”nucest”,term=”txid%s[Organism:exp]” %(txid), retmax=prim_count)#, usehistory=”y”) #retmax=count
    search_results = Entrez.read(search_handle)
    gi_list = search_results[“IdList”]
    count = int(search_results[“Count”])
    print count
    assert count == len(gi_list), len(gi_list)
    #webenv = search_results[“WebEnv”]
    #query_key = search_results[“QueryKey”]
    out_fasta = “%s_txid%i_ct%i.fasta” %(sys.argv[2], txid, count)
    out_handle = open(out_fasta, “a”)
    ## Approach1: gets <XML> tags within the fasta file <ERROR>Unable to obtain query #1</ERROR> #Enhanced by Peter Cock
    batch_size = 1000
    for start in range(0,count,batch_size):
            end = min(count, start+batch_size)
            batch = gi_list[start:end]
            print “Going to download record %i to %i using epost+efetch” % (start+1, end)
            post_results = Entrez.read(Entrez.epost(“nucest”, id=”,”.join(batch)))
            webenv = post_results[“WebEnv”]
            query_key = post_results[“QueryKey”]
            fetch_handle = Entrez.efetch(db=”nucest”, rettype=”fasta”, retmode=”text”, webenv=webenv, query_key=query_key) #retstart=start, retmax=batch_size,
            data = fetch_handle.read()
    ## Approach2: split list [REF needed]
    #def SplitList( list, chunk_size ) :
    #        return [list[offs:offs+chunk_size] for offs in range(0, len(list), chunk_size)]
    #z = SplitList(gi_list, 10000)
    #for i in range(0, len(z)):
    #       print len(z[i])
    #       post_handle = Entrez.epost(“nucest”, rettype=”fasta”, retmode=”text”, id=”,”.join(z[1]))
    #       webenv = search_results[“WebEnv”]
    #       query_key = search_results[“QueryKey”]
    #       fetch_handle = Entrez.efetch(db=”nucest”, rettype=”fasta”, retmode=”text”, webenv=webenv, query_key=query_key)
    #       data = fetch_handle.read()
    #       fetch_handle.close()
    #       out_handle.write(data)
    ## Approach3: with most consistent retrieval but limited to 10000
    #fetch_handle = Entrez.efetch(db=”nucest”, rettype=”fasta”, retmode=”text”, webenv=webenv, query_key=query_key)
    #data = fetch_handle.read()

  • Mariam Rizkallah 12:14 pm on April 4, 2012 Permalink | Reply
    Tags: AUC, cycle, grads, mentorship, perish, publishing, research, thesis   

    Thesis Writing Process Workshop For students in Sciences and Engineering – AUC 

    The notes that I took at the “Thesis Writing Process Workshop” conducted by Drs. Sherif Aly and Ali Darwish at AUC.


    A) General tips:

    Why go for grad school, and which program to choose and which type of program (thesis vs. non-thesis)?

    The more you give, the more you receive

    Don’t work on your own.. try 3 topics, work in 2-researcher groups so whoever finishes first you two publish

    Zoom out – Understand the field at the fundamental level

    Intelligent creatures communicate with one way or another (Don’t live in isolation)

    B) Selecting a topic:

    Thesis: a dissertation in which you

    • independently address an important problem,
    • provide original contribution to knowledge
    • Re-Search: again (see what is obvious) and again (see the less obvious) and again (understand the phenomenon)…

    C) “If your mother tells you that she loves you, look for evidence!”

    Ibn-Al-Haitham, the father of scientific research, Make a hypothesis: Postulate -> test it and test the opposite->accept the results without assumptions

    Never accept facts or assumptions without verifying them.

    –          READ READ A LOT. Take notes.

    –          Start early about the topic

    –          Talk to others. Don’t insulate yourself, verbalize thoughts, logical arguments, questions.

    D) Where to look?

    • Look in dissertations rather than papers
    • Suggestions for future prospects
    • Conference presentations
    • Big scholars (key players in your field)
    • Landmark papers (close gap in your understanding, don’t reinvent the wheel)

    E) Factors affecting topic choice:

    • Your next phase (academia vs. industry)
    • Gut instinct (100% motivation vs. 100% intelligence) (completely dedicate yourself to science)
    • Don’t try for Nobel Prize!
    • Elegant and simple (simple things last)
    • Practice enlightened trial and error (Logical moves vs. fooling around “curiosity”)
    • Lose focus sometime
    • Advisor’s specialization
    • What’s hot is not in books
    • Discover what’s missing
    • Make analogy, borrow ideas from other domains (Cars and people)

    ** Robert Boice: “Write before you’re ready”

    G) How to pick a point:

    1- Understand existing system

    2- Decompose the problem

    3- Pick choices (mix and match, because you understand a problem at the fundamental level)

    H) Crash ideas:

    1-       Don’t shut idea because it’s insane (filter later)

    2-       Draw ideas -> physical form

    3-       Wish and wonder that you overcame a problem

    4-       Go to the beach.. relax

    5-       TIPS: Theory of inventive problem solving

    6-       Nature: “What kind of person would spend hours and hours thinking about electric field?” Maxwell

    7-      Have good command of tools

    8-      Read Bios (Wikipedia, what inspired El-Khawarizmi, what was unknown to him that made him go for the polynomial equations?)

    9-      Observe: what are people using and why?

    10-   Discover a gap in people’s knowledge

    11-    Intentionally screw equation up

    ** “Continue looking for needles in a haystack, don’t be satisfied with the first one”

    Choosing a topic is like choosing a partner (Love, commitment and dedication)!

    Proposing for your topic

    1) Formulate a proposal: Means that I am working on a justifiable topic.

    –          A research is as good as a proposal (committee, grant, conference …)

    –          What do you want to accomplish.. what are you after?

    –          Why I would like to change the way of things? Fire up the audience -> killer scenario

    –          How are you going to solve the problem? Don’t be worried that you don’t ready solution, this means that it is a “research topic”

    –          Audience: How scientifically you sound to tackle a problem? Read others work, homework, what’s been achieved? Focused? Very obvious things don’t need a scientific research (bang screen hard will break it #Fact)

    –          Look for gaps, break that assumptions

    2) Proposal:

    1-       Title: attractive, descriptive, suitable length

    2-       Introduction: put reader in context, funneling approach (1 paragraph for each level), give example to raise the question (data transfer), list key contributions, general methodology

    3-       Problem definition: 2 paragraphs, concise and clear

    4-       Thesis contribution statement (1-2 paragraphs, very accurate)

    5-       Literature review: recent, done homework [2012 published article = 2009 finished work], conference proceedings, look right, evaluate others work and classification.

    6-       Proposed approach: solid methodology, anyone else done this before, how are you going to evaluate myself (comparative, (automated design relatively to human designers), modifiable

    7-       Conclusion

    Research cycle

    ** One common thing among all great scientists: persistence, training, not intelligent: every time they fail, they re-try

    Being a disciple of good scientists -> tree of supervisors -> Master something

    Close the loops and complete the cycle:

    Crazy ideas/concepts (generated faster, but you need prioritization ) -> Plan/design -> Fund [non-profit]/pay students’ tuition -> implement ideas -> analyze idea (evaluate research, assess yourself, compare to other approaches) -> dissemination [publish, conference] [rejected from a certain journal != bad] -> when polluted with the ecosystem (take year-off, study abroad)  -> evaluate and recommend

    ** “Egypt’s past regime implemented the policy of “Man in the iron mask” locking creative people

    Last stage [Last month.. don’t leave the school/lab]: Play with parameter, graphs, and trial and error..

    If you’re running for PhD or postdoc, labs prefer to take a plug-and-play researcher (already had the experience, mastered the techniques, and published) ..

    True mentor says: “I will never put my name before your name unless I did the bulk of the work”

    Create a mind map: put what is in your head on a paper

    Thesis = proposal – a component (the not-sure-methodology)++

    The process of writing a thesis should be iterative, have a holistic view.. (Do your research and build on top of it a dissertation -> PhD)

    –          Arguments clear [enough information to reproduce and build on it]

    –          Tone appropriate

    –          Considerate to reads, explain to readers in details every step, Don’t shock the reader

    –          Follow a nice logical flow, roadmap Section1 (paragraph) 1.1…etc. Topic sentence..etc.

    Publishing your work [Publish or perish]

    1)      Make sure that your work is assess-able by others, and repeat-able

    3)       If you don’t publish, your thesis is a waste of time.. Great instructors were let when they don’t publish, you’re so weak, research was useless.

    4)       When: NOW.. write reviews about state-of-the-art.. mentors can say “write me something within the month and I send it out”

    5)       Where: Choosing a journal

    6)       Follow the instructions: Format…etc.

    ** Never say that authors work is incomplete, and here we …etc. -> Authors addressed those aspects, and we are addressing the following..

    “Neat work .. don’t get lost” “what is the significant if it’s done before”

    Plan (outline) -> very rough draft -> second draft -> send to advisor -> 3-4 iterations -> polished ->send to journal

    Tips to increase the publishability of your manuscript:

    –          Abstract: most important, written last (2 weeks vs. 2 months)

    –          Introduction: why, how and what

    –          Referencing milestones, clarity, advisor’s name/lab, sound conclusion and discussion

    –          State very clearly what’s novel

    –          Art of storytelling (consistency)

    –          Determination


    End of talk!

    What really make me feel extremely stupid is that most tips I get them all the time from mentors, friends.. Stupid! Why not implementing them!? Plus, speakers linked this to their own research experience, which added intimacy –sometimes despair (!)– to their talk.

Compose new post
Next post/Next comment
Previous post/Previous comment
Show/Hide comments
Go to top
Go to login
Show/Hide help
shift + esc