bioinformatics - Querying NCBI for a sequence from ncbi via Biopython -
how can query ncbi sequences given chromosome's genbank identifier, , start , stop positions using biopython?
cp001665 napp tile 6373 6422 . + . cluster=9; cp001665 napp tile 6398 6447 . + . cluster=3; cp001665 napp tile 6423 6472 . + . cluster=3; cp001665 napp tile 6448 6497 . + . cluster=3; cp001665 napp tile 7036 7085 . + . cluster=10; cp001665 napp tile 7061 7110 . + . cluster=3; cp001665 napp tile 7073 7122 . + . cluster=3;
from bio import entrez bio import seqio entrez.email = "sample@example.org" handle = entrez.efetch(db="nuccore", id="cp001665", rettype="gb", retmode="text") whole_sequence = seqio.read(handle, "genbank") print whole_sequence[6373:6422]
once know id
, database fetch from, use entrez.efetch
handle file. should specify returning type (rettype="gb"
) , mode (retmode="text"
), handler filelike data.
then pass handler seqio
, should return seqrecord
object. 1 nice feature of seqrecord
s can cleanly sliced lists. if can retrieve starting , ending points somewhere, above print
statement returns:
id: cp001665.1 name: cp001665 description: escherichia coli 'bl21-gold(de3)plyss ag', complete genome. number of features: 0 seq('gcgctaaccatgcgagcgtgcctgatgcgctacgcttatcaggcctacg', iupacambiguousdna())
Comments
Post a Comment