TY-Exercise_4_(35)(Updated)
TY-Exercise_4_(35)(Updated)
– 35
EXERCISE 4
Title- Performing Database Search using NCBI BLAST
b) tBLASTn:
1
Roll No. – 35
tBLASTn used to search for proteins in sequences that haven't been translated into
proteins yet. It takes a protein sequence and compares it to all possible translations of a
DNA sequence. This is useful when looking for similar protein-coding regions in DNA
sequences that haven't been fully annotated, like ESTs (short, single-read cDNA
sequences) and HTGs (draft genome sequences). Since these sequences don't have
known protein translations, we can only search for them using tBLASTn.
c) BLASTx:
BLASTx compares a nucleotide query sequence, which can be translated into six
different protein sequences, against a database of known protein sequences. This tool
is useful when the reading frame of the DNA sequence is uncertain or contains errors
that might cause mistakes in protein-coding. BLASTx provides combined statistics
for hits across all frames, making it helpful for the initial analysis of new DNA
sequences.
d) BLASTp:
BLASTp, or Protein BLAST, is used to compare protein sequences. You can
input one or more protein sequences that you want to compare against a single
protein sequence or a database of protein sequences. This is useful when you're
trying to identify a protein by finding similar sequences in existing protein
databases.
Exercises:
1. Given a protein sequence, find out which protein it is and from which organism it belongs
using BLAST Program.
>sample sequence
MERGVRRGAALVAAWRSLWERGGLALFRPQCRTGCGACRVQGTRPFSLSAAASAVLGLGSWGGDSGKQ
KLTLQDVAELIRKKECRRVVVMAGAGIS
TPSGIPDFRSPGSGLYSNLEQYNIPYPEAIFELAYFFINPKPFFTLAKELYPGNYRPNYAHYFLRLLHDKGLLL
RLYTQNIDGLERVAGIPPDRLVEAHGT
FATATCTVCRRKFPGEDFRGDVMADKVPHCRVCTGIVKPDIVFFGEELPQRFFLHMTDFPMADLLFVIGTSL
EVEPFASLAGAVRNSVPRVLINRDLV
GPFAWQQRYNDIAQLGDVVTGVEKMVELLDWNEEMQTLIQKEKEKLDAKDK
2
Roll No. – 35
3
Roll No. – 35
The sum of the alignment scores of all of the segments from the sequence. The higher the score,
the better the alignment. For this, Total score is – 714.
What is percent identity?
The % of bases that are identical to the reference genome. A query sequence can have a low %
identity, but still be a real hit. It is essential to take the E value into account and look for homology
between conserved regions – this will be evident at the protein level. For this, percent identity is –
100%.
Are there any homologous (orthologous/paralogous) sequences? Which are those? Mention them.
Yes, Only Orthologous. The sequence is said to be Orthologous if they were separated by a
speciation event: when a species diverges into two separate species, the divergent copies of a single
gene in the resulting species are said to be orthologous. 5 best orthologous sequences from the
given protein sequence are given below:
a) Gallus gallus
b) Phasianus colchicus
c) Centrocercus urophasianus
d) Lagopus muta
e) Tympanuchus pallidicinctus
Which the best match?
NP_001186422.1 (Gallus gallus)
Max score: 714
Total Score: 714
E value: 0.0
Which is the least match?
NWZ52525.1 (Haliaeetus albicilla)
Max score: 579
Total score: 579
E value: 0.0
2. Given a nucleotide sequence, find the details regarding the sequence like which organism,
name, accession id etc. using BLAST
>sample sequence
AGTGCCGCGCGTCGAGCGGAGCAGAGGAGGCGAGGGCGGAGGGCCAGAGAGGCAGTTGGAAGATGG
CGGACGAGGTGGCGCTCGCCCTTCAGGCCGCCGGCTCCCCTTCCGCGGCGGCCGCCATGGAGGCCGCG
TCGCAGCCGGCGGACGAGCCGCTCCGCAAGAGGCCCCGCCGAGACGGGCCTGGCCTCGGGCGCAGCC
CGGGCGAGCCGAGCGCAGCAGTGGCGCCGGCGGCCGCGGGGTGTGAGGCGGCGAGCGCCGCGGC
CCCGGCGGCGCTGTGGCGGGAGGCGGCAGGGGCGGCGGCGAGCGCGGAGCGGGAGGCCCCGGCGAC
GGCCGTGGCCGGGGACGGAGACAATGGGTCCGGCCTGCGGCGGGAGCCGAGGGCGGCTGACGACTTC
GACGACGACGAGGGCGAGGAGGAGGACGAGGCGGCGGCGGCAGCGGCGGCGGCAGCGATCGGCTAC
CGAGACAACCTCCTGTTGACCGATGGACTCCTCACTAATGGCTTTCATTCCTGTGAAAGTGATGACGAT
GACAGAACGTCACACGCCAGCTCTAGTGACTGGACTCCGCGGCCGCGGATAGGTCCATATACTTTTGT
TCAGCAACATCTCATGATTGGCACCGATCCTCGAACAATTCTTAAAGATTTATTACCAGAAACAATTCC
TCCACCTGAGCTGGATGATATGACGCTGTGGCAGATTGTTATTAATATCCTTTCAGAACCACCAAAGC
GGAAAAAAAGAAAAGATATCAATACAATTGAAGATGCTGTGAAGTTACTGCAGGAGTGTAAAAAGAT
4
Roll No. – 35
AATAGTTCTGACTGGAGCTGGGGTTTCTGTCTCCTGTGGGATTCCTGACTTCAGATCAAGAGACGGTAT
CTATGCTCGCCTTGCGGTGGACTTCCCAGACCTCCCAGACCCTCAAGCCATGTTTGATATTGAGTATTT
TAGAAAAGACCCAAGACCATTCTTCAAGTTTGCAAAGGAAATATATCCCGGACAGTTCCAGCCGTCTC
TGTGTCACAAATTCATAGCTTTGTCAGATAAGGAAGGAAAACTACTTCGAAATTATACTCAAAATATA
GATACCTTGGAGCAGGTTGCAGGAATCCAAAGGATCCTTCAGTGTCATGGTTCCTTTGCAACAGCATC
TTGCCTGATTTGTAAATACAAAGTTGATTGTGAAGCTGTTCGTGGAGACATTTTTAATCAGGTAGTTCC
TCGGTGCCCTAGGTGCCCAGCTGATGAGCCACTTGCCATCATGAAGCCAGAGATTGTCTTCTTTGGTGA
AAACTTACCAGAACAGTTTCATAGAGCCATGAAGTATGACAAAGATGAAGTTGACCTCCTCATTGTTA
TTGGATCTTCTCTGAAAGTGAGACCAGTAGCACTAATTCCAAGTTCTATACCCCATGAAGTGCCTCAAA
TATTAATAAATAGGGAACCTTTGCCTCATCTACATTTTGATGTAGAGCTCCTTGGAGACTGCGATGTT
ATAATTAATGAGTTGTGTCATAGGCTAGGTGGTGAATATGCCAAACTTTGTTGTAACCCTGTAAAGCTT
TCAGAAATTACTGAAAAACCTCCACGCCCACAAAAGGAATTGGTTCATTTATCAGAGTTGCCACCAAC
ACCTCTTCATATTTCGGAAGACTCAAGTTCACCTGAAAGAACTGTACCACAAGACTCTTCTGTGATTGC
TACACTTGTAGACCAAGCAACAAACAACAATGTTAATGATTTAGAAGTATCTGAATCAAGTTGTGTGG
AAGAAAAACCACAAGAAGTACAGACTAGTAGGAATGTTGAGAACATTAATGTGGAAAATCCAGATTT
TAAGGCTGTTGGTTCCAGTACTGCAGACAAAAATGAAAGAACTTCAGTTGCAGAAACAGTGAGAAAA
TGCTGGCCTAATAGACTTGCAAAGGAGCAGATTAGTAAGCGGCTTGAGGGTAATCAATACCTGTTTGT
ACCACCAAATCGTTACATATTCCACGGTGCTGAGGTATACTCAGACTCTGAAGATGACGTCTTGTCCTC
TAGTTCCTGTGGCAGTAACAGTGACAGTGGCACATGCCAGAGTCCAAGTTTAGAAGAACCCTTGGAAG
ATGAAAGTGAAATTGAAGAATTCTACAATGGCTTGGAAGATGATACGGAGAGGCCCGAATGTGCTGG
AGGATCTGGATTTGGAGCTGATGGAGGGGATCAAGAGGTTGTTAATGAAGCTATAGCTACAAGACAG
GAATTGACAGATGTAAACTATCCATCAGACAAATCATAACACTATTGAAGCTGTCCGGATTCAGGAAT
TGCTCCACCAGCATTGGGAACTTTAGCATGTCAAAAAATGAATGTTTACTTGTGAACTTGAACAAGGA
AATCTGAAAGATGTATTATTTATAGACTGGAAAATAGATTGTCTTCTTGGATAATTTCTAAAGTTCCAT
CATTTCTGTTTGTACTTGTACATTCAACACTGTTGGTTGACTTCATCTTCCTTTCAAGGTTCATTTGTAT
GATACATTCGTATGTATGTATAATTTTGTTTTTTGCCTAATGAGTTTCAACCTTTTAAAGTTTTCAAAAG
CCATTGGAATGTTAATGTAAAGGGAACAGCTTATCTAGACCAAAGAATGGTATTTCACACTTTTTTGTT
TGTAACATTGAATAGTTTAAAGCCCTCAATTTCTGTTCTGCTGAACTTTTATTTTTAGGACAGTTAACTT
TTTAAACACTGGCATTTTCCAAAACTTGTGGCAGCTAACTTTTTAAAATCACAGATGACTTGTAATGTG
AGGAGTCAGCACCGTGTCTGGAGCACTCAAAACTTGGTGCTCAGTGTGTGAAGCGTACTTACTGCATC
GTTTTTGTACTTGCTGCAGACGTGGTAATGTCCAAACAGGCCCCTGAGACTAATCTGATAAATGATTTG
GAAATGTGTTTCAGTTGTTCTAGAAACAATAGTGCCTGTCTATATAGGTCCCCTTAGTTTGAATATTTG
CCATTGTTTAATTAAATACCTATCACTGTGGTAGAGCCTGCATAGATCTTCACCACAAATACTGCCAAG
ATGTGAATATGCAAAGCCTTTCTGAATCTAATAATGGTACTTCTACTGGGGAGAGTGTAATATTTTGGA
CTGCTGTTTTTCCATTAATGAGGAAAGCAATAGGCCTCTTAATTAAAGTCCCAAAGTCATAAGATAAA
TTGTAGCTCAACCAGAAAGTACACTGTTGCCTGTTGAGGATTTGGTGTAATGTATCCCAAGGTGTTAGC
CTTGTATTATGGAGATGAATACAGATCCAATAGTCAAATGAAACTAGTTCTTAGTTATTTAAAAGCTTA
GCTTGCCTTAAAACTAGGGATCAATTTTCTCAACTGCAGAAACTTTTAGCCTTTCAAACAGTTCACACC
TCAGAAAGTCAGTATTTATTTTACAGACTTCTTTGGAACATTGCCCCCAAATTTAAATATTCATGTGGG
TTTAGTATTTATTACAAAAAAATGATTTGAAATATAGCTGTTCTTTATGCATAAAATACCCAGTTAGGA
CCATTACTGCCAGAGGAGAAAAGTATTAAGTAGCTCATTTCCCTACCTAAAAGATAACTGAATTTATTT
GGCTACACTAAAGAATGCAGTATATTTAGTTTTCCATTTGCATGATGTGTTTGTGCTATAGACAATATT
TTAAATTGAAAAATTTGTTTTAAATTATTTTTACAGTGAAGACTGTTTTCAGCTCTTTTTATATTGTACA
TAGACTTTTATGTAATCTGGCATATGTTTTGTAGACCGTTTAATGACTGGATTATCTTCCTCCAACTTTT
GAAATACAAAAACAGTGTTTTATACTTGTATCTTGTTTTAAAGTCTTATATTAAAATTGTCATTTGACTT
TTTTCCCGTTAAAAAAAAAAAAAAA
5
Roll No. – 35
It is the mRNA sequence of Mus musculus sirtuin 1 (Sirt1). Its Accession ID: NM_019812.3
6
Roll No. – 35
The % of bases that are identical to the reference genome. A query sequence can have a low %
identity, but still be a real hit. It is essential to take the E value into account and look for homology
between conserved regions – this will be evident at the protein level. For this, percent identity is –
100%.
Are there any homologous (orthologous/paralogous) sequences? Which are those? Mention them.
Only Orthologous sequences are present. The sequence is said to be Orthologous if they were
separated by a speciation event: when a species diverges into two separate species, the divergent
copies of a single gene in the resulting species are said to be orthologous.
a) Mus musculus
b) Mus caroli
c) Arvicanthis niloticus
d) Mastomys coucha
e) Rattus norvegicus