Lecture 5
Lecture 5
Bioinformatics chicken
xenopus
PLVSS---PLRGEAGVLPFQQEEYEKVKRGIVEQCCHNTCSLYQLENYCN
ALVSG---PQDNELDGMQLQPQEYQKMKRGIVEQCCHSTCSLFQLESYCN
human LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
Lecture 5
monkey PQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
dog LQVRDVELAGAPGEGGLQPLALEGALQKRGIVEQCCTSICSLYQLENYCN
hamster PQVAQLELGGGPGADDLQTLALEVAQQKRGIVDQCCTSICSLYQLENYCN
b B
c C
We often assume that gene trees give us species trees
Human Hox genes Concepts: Paralogy & Orthology
1
How to do MSA? Using Clustal
Dynamic programming: accurate, but slow Clustal: the most popular MSA program
ClustalW @ EBI
http://www.ebi.ac.uk/Tools/clustalw2/
Adjust parameters
MSA Output
http://www.ebi.ac.uk/Tools/clustalw/help.html
2
Clustal local (ClustalX) Step 1: input sequences
Help file: Using ClustalX for multiple sequence alignment Load sequences
by Jarno Tuimala
Profile Alignment
3
Step 4: save the output in a selected format Decorate MSA output (1)
Boxshade highlight the identical/similar sites
(http://www.ch.embnet.org/software/BOX_form.html)
Copy the output from EBI ClustalW Output page
4
Sequence Logos 2. Phylogenetic analysis
http://weblogo.berkeley.edu/logo.cgi Analyze evolutionary relationships for genes and
http://weblogo.threeplusone.com/create.cgi proteins
http://genome.tugraz.at/Logo/ Construct Phylogenetic Trees
T. D. Schneider and R. M. Stephens. Sequence logos: a new way to display A tree showing the
consensus sequences. Nucleic Acids Research, Vol. 18, No 20, p. 6097-6100. evolutionary
relationships among
various biological
species or other entities
that are believed to
have a common
ancestor.
End node
Classic Evolutionary Biology: Branch A
can be
Comparison: Morphology, Structure, Fossils B species,
population,
C or protein,
Node
Molecular evolution: D DNA, RNA
Root molecules
Compare DNA and Protein sequences E etc.
Internal /divergence OTU
node
Possible ancestors = ((A, (B,C)), (D, E))
HTU Newick format
5
Terms for a Phylogenetic Tree Terms for a Phylogenetic Tree
A clade is a group of
organisms that includes an
Branch
ancestor and all length
descendents of that
Scaled branches : the
ancestor. length of the branch
Phylogram Ultrametric tree is proportional to the
Cladogram number of changes.
6
Taxon B 1 Taxon B Taxon B The distance between
1 2 species is the sum
Taxon C 3 Taxon C Taxon C
of the length of all
1
Taxon A Taxon A Taxon A branches connecting
them.
Taxon D 5 Taxon D
Taxon D
no meaning genetic change time
UPGMA
Choose Methods
B D (substitution model) (Neighbor-joining, NJ)
(maximum parsimony, MP)
(distance) (minimum evolution)
two major ways to rooted trees: (maximum likelihood, ML)
(Bayesian inference)
By midpoint or distance
A
d (A,D) = 10 + 3 + 5 = 18
Midpoint = 18 / 2 = 9
Tree Construction
10
C Statistical analysis
3 2
Tree Evaluation Bootstrap
2
B 5 D Likelihood Ratio Test
outgroup
……
6
Choosing a Method for Phylogenetic Prediction MSA is the Key step for tree construction
Bioinformatics: Sequence and Genome Analysis, 2nd edition, by David W. Mount. p254 Homologous sequences are needed!
http://cshprotocols.cshlp.org/cgi/content/full/2008/5/pdb.ip49
7
Tools for Visualization Software for Phylogenetic Analysis
TreeView software for editing and printing evolution trees
PHYLIP http://evolution.genetics.washington.edu/phylip.html
(http://taxonomy.zoology.gla.ac.uk/rod/treeview.html) free and integrated tool for evolutionary analysis
PAUP http://paup.csit.fsu.edu/
Choose “tree type” from PHYLOGENETIC TREE field commercial and integrated tool for evolutionary analysis
at EBI ClustalW page MEGA http://www.megasoftware.net/
free and graphic integrated tool, including the ML algorithm in the
latest version
Input MSA output (or upload an ALN file) PHYML http://atgc.lirmm.fr/phyml/
fastest ML tree construction software
PAML http://abacus.gene.ucl.ac.uk/software/paml.html
ML tree construction software
Download “Phylip tree file” (ph Tree-puzzle http://www.tree-puzzle.de/
faster ML tree construction software
file) MrBayes http://mrbayes.csit.fsu.edu/
Tree construction software based on Bayesian inference
Open the above files with TreeView program More tools: http://evolution.gs.washington.edu/phylip/software.html
Display trees in different forms (1, 2, 3)
0.02