Extract certain sequences from a parent sff into a child one
After excluding certain sequences from my fasta file and generating a new fasta, I readlly needed all the info from the qual and the fasta to be gathered again into the sff. As a BioPython fan, I really found it all in their tutorial.
If you find it useful, share it. If you have any comments or bugs, please contact me.
#!/usr/bin/python from Bio import SeqIO import sys #Script to extract sequences into a child sff from a parent sff #Author: Mariam Rizkallah - August 16, 2011 #argv -> parent sff #argv -> fasta #argv -> child sff #Example: ./selectFromSff.py parent.sff new_fasta.fasta child.sff (p.s. chmod +x) parent_sff = sys.argv selected_fasta = sys.argv child_sff = sys.argv records = (record for record in SeqIO.parse(parent_sff, "sff") if record.id in (r.id for r in SeqIO.parse(selected_fasta, "fasta") ) ) count = SeqIO.write(records, child_sff, "sff") print "Selected %i records" % count print "Selected %i records" % count