Extract certain sequences from a parent sff into a child one 

After excluding certain sequences from my fasta file and generating a new fasta, I readlly needed all the info from the qual and the fasta to be gathered again into the sff. As a BioPython fan, I really found it all in their tutorial.

If you find it useful, share it. If you have any comments or bugs, please contact me.


from Bio import SeqIO
import sys

#Script to extract sequences into a child sff from a parent sff
#Author: Mariam Rizkallah - August 16, 2011
#argv[1] -> parent sff
#argv[2] -> fasta
#argv[3] -> child sff
#Example: ./selectFromSff.py parent.sff new_fasta.fasta child.sff (p.s. chmod +x)

parent_sff = sys.argv[1]
selected_fasta = sys.argv[2]
child_sff = sys.argv[3]

records = (record for record in
		SeqIO.parse(parent_sff, "sff")
		if record.id in (r.id for r in
			SeqIO.parse(selected_fasta, "fasta")
count = SeqIO.write(records, child_sff, "sff")
print "Selected %i records" % count
print "Selected %i records" % count