0% found this document useful (0 votes)
15 views58 pages

Phylogeny

The document outlines the principles of molecular phylogeny and evolution, detailing the five stages involved in constructing phylogenetic trees: selecting sequences, multiple sequence alignment, models of substitution, tree-building, and tree evaluation. It discusses the historical context of molecular evolution, including the molecular clock hypothesis and the concepts of positive and negative selection, as well as the importance of distinguishing between homologous and analogous traits. Additionally, it explains tree nomenclature, including the significance of clades and the use of outgroups for rooting trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views58 pages

Phylogeny

The document outlines the principles of molecular phylogeny and evolution, detailing the five stages involved in constructing phylogenetic trees: selecting sequences, multiple sequence alignment, models of substitution, tree-building, and tree evaluation. It discusses the historical context of molecular evolution, including the molecular clock hypothesis and the concepts of positive and negative selection, as well as the importance of distinguishing between homologous and analogous traits. Additionally, it explains tree nomenclature, including the significance of clades and the use of outgroups for rooting trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

Molecular Phylogeny

and Evolution

Bioinformatics

1
Goals of the lecture

Introduction to evolution and phylogeny

Nomenclature of trees

Five stages of molecular phylogeny:


[1] selecting sequences
[2] multiple sequence alignment
[3] models of substitution
[4] tree-building
[5] tree evaluation

2
Introduction

Charles Darwin’s 1859 book (On the Origin of Species


By Means of Natural Selection, or the Preservation
of Favoured Races in the Struggle for Life) introduced
the theory of evolution.

To Darwin, the struggle for existence induces a natural


selection. Offspring are dissimilar from their parents
(that is, variability exists), and individuals that are more
fit for a given environment are selected for. In this way,
over long periods of time, species evolve. Groups of
organisms change over time so that descendants differ
structurally and functionally from their ancestors.

3
Page 357
Introduction

At the molecular level, evolution is a process of


mutation with selection.

Molecular evolution is the study of changes in genes


and proteins throughout different branches of the
tree of life.

Phylogeny is the inference of evolutionary relationships.

Traditionally, phylogeny relied on the comparison


of morphological features between organisms. Today,
molecular sequence data are also used for phylogenetic
analyses.
4
Historical background

Studies of molecular evolution began with the first


sequencing of proteins, beginning in the 1950s.

In 1953 Frederick Sanger and colleagues determined


the primary amino acid sequence of insulin.

(The accession number of human insulin is NP_000198)

5
Page 358
Mature insulin consists of an A chain and B chain
heterodimer connected by disulphide bridges

The signal peptide and C peptide are cleaved,


and their sequences display fewer
functional constraints.
Fig.6 11.1
Page 359
Fig.7 11.1
Page 359
Note the sequence divergence in the
Fig.8 11.1
disulfide loop region of the A chain
Page 359
Historical background: insulin

By the 1950s, it became clear that amino acid


substitutions occur nonrandomly. For example, Sanger
and colleagues noted that most amino acid changes in the
insulin A chain are restricted to a disulfide loop region.
Such differences are called “neutral” changes
(Kimura, 1968; Jukes and Cantor, 1969).

Subsequent studies at the DNA level showed that rate of


nucleotide (and of amino acid) substitution is about six-
to ten-fold higher in the C peptide, relative to the A and B
chains.

9
Page 358
0.1 x 10-9

1 x 10-9 0.1 x 10-9

Fig.
10 11.1
Number of nucleotide substitutions/site/year
Page 359
Historical background: insulin

Surprisingly, insulin from the guinea pig (and from the


related coypu) evolve seven times faster than insulin
from other species. Why?

The answer is that guinea pig and coypu insulin


do not bind two zinc ions, while insulin molecules from
most other species do. There was a relaxation on the
structural constraints of these molecules, and so
the genes diverged rapidly.

11
Page 360
Guinea pig and coypu insulin have undergone an
extremely rapid rate of evolutionary change

Arrows indicate positions at which guinea pig


insulin (A chain and B chain) differs
from both human and mouse
Fig.
12
11.1
Page 359
Molecular clock hypothesis

In the 1960s, sequence data were accumulated for


small, abundant proteins such as globins,
cytochromes c, and fibrinopeptides. Some proteins
appeared to evolve slowly, while others evolved rapidly.

Linus Pauling, Emanuel Margoliash and others


proposed the hypothesis of a molecular clock:

For every given protein, the rate of molecular


evolution is approximately constant in all
evolutionary lineages
13
Page 360
Molecular clock hypothesis

As an example, Richard Dickerson (1971) plotted data


from three protein families: cytochrome c,
hemoglobin, and fibrinopeptides.

The x-axis shows the divergence times of the species,


estimated from paleontological data. The y-axis shows
m, the corrected number of amino acid changes per
100 residues.

n is the observed number of amino acid changes per


100 residues, and it is corrected to m to account for
changes that occur but are not observed.
N = 1 – e-(m/100)
14
100 Page 360
Dickerson
corrected amino acid changes

(1971)
per 100 residues (m)

Fig.
15
11.3
Millions of years since divergence Page 361
Molecular clock hypothesis: conclusions

Dickerson drew the following conclusions:

• For each protein, the data lie on a straight line. Thus,


the rate of amino acid substitution has remained
constant for each protein.

• The average rate of change differs for each protein.


The time for a 1% change to occur between two lines
of evolution is 20 MY (cytochrome c), 5.8 MY
(hemoglobin), and 1.1 MY (fibrinopeptides).

• The observed variations in rate of change reflect


functional constraints imposed by natural selection.
16
Page 361
Molecular clock hypothesis: implications

If protein sequences evolve at constant rates,


they can be used to estimate the times that
species diverged.

17
Page 362
Positive and negative selection
Darwin’s theory of evolution suggests that, at the
phenotypic level, traits in a population that enhance
survival are selected for, while traits that reduce fitness
are selected against.

For example, among a group of giraffes millions of


years in the past, those giraffes that had longer necks
were able to reach higher foliage and were more
reproductively successful than their shorter-necked
group members, that is, the taller giraffes were selected
for.

In the mid-20th century, a conventional view was that


molecular sequences are routinely subject to positive
(or negative) selection. 18
Positive and negative selection

Positive selection occurs when a sequence undergoes


significantly increased rates of substitution, while
negative selection occurs when a sequence undergoes
change slowly. Otherwise, selection is neutral.

Negative selection (natural selection), in natural selection it


refers to the selective removal of rare alleles that are
deleterious

19
20
Homology vs. Analogy

 Homology: morphological or
molecular characteristic that is the
same due to common ancestory

 Analogy: Morphological
characteristic is the same
due to convergent or
parallel evolution
(environmental pressures)

21
Goals of the lecture

Introduction to evolution and phylogeny

Nomenclature of trees

Five stages of molecular phylogeny:


[1] selecting sequences
[2] multiple sequence alignment
[3] models of substitution
[4] tree-building
[5] tree evaluation

22
Molecular phylogeny: nomenclature of trees

There are two main kinds of information inherent


to any tree: topology and branch lengths.

We will now describe the parts of a tree.

23
Page 366
Molecular phylogeny uses trees to depict evolutionary
relationships among organisms. These trees are based
upon DNA and protein sequence data.

2 A A
2
F
1 1 1
1
2 G B B
2 C
I H 2 C
2 2
1 D
1
6
D
6

E one unit
E
time

Fig.
24
11.4
Page 366
Tree nomenclature

taxon

taxon
2 A A
2
F
1 1 1
1
2 G B B
2 C
I H 2 C
2 2
1 D
1
6
D
6

E one unit
E
time

Fig.2511.4
Page 366
Tree nomenclature

operational taxonomic unit (OTU)


such as a protein sequence
taxon
2 A A
2
F
1 1 1
1
2 G B B
2 C
I H 2 C
2 2
1 D
1
6
D
6

E one unit
E
time

Fig.26 11.4
Page 366
Tree nomenclature

Node (intersection or terminating point


of two or more branches)
branch 2 A A
(edge) F
2

1 1 1
1
2 G B B
2 C
I H 2 C
2 2
1 D
1
6
D
6

E one unit
E
time

Fig.27 11.4
Page 366
Tree nomenclature

Branches are unscaled... Branches are scaled...


2 A A
2
F
1 1 1 1
2 G B B
2 C
I H 2 C
2 2
1 D
1
6
D
6

E one unit
E
time

…OTUs are neatly aligned, …branch lengths are


and nodes reflect time proportional to number of
amino acid changes
Fig.28 11.4
Page 366
Tree nomenclature

bifurcating multifurcating
internal internal
node node
2 A A
2
F
1 1 1

2 G B B
2 C
I H 2 C
2 2
1 D
1
6
D
6

E one unit
E
time

Fig.29 11.5
Page 367
Examples of multifurcation: failure to resolve the branching order
of some metazoans and protostomes

Rokas A. et al., Animal Evolution and the Molecular Signature of Radiations


30
Compressed in Time, Science 310:1933 (2005), Fig. 1.
Tree nomenclature: clades

Clade ABF (monophyletic group)

2 A
F
1 1

2 G B
I H 2 C

1
6
D

time

Fig.31 11.4
Page 366
Tree nomenclature

2 A
F
1 1

2 G B
I H 2 C
Clade CDH
1
6
D

time

Fig.32 11.4
Page 366
Tree nomenclature

Clade ABF/CDH/G

2 A
F
1 1

2 G B
I H 2 C

1
6
D

time

Fig.
33
11.4
Page 366
Examples of clades

Lindblad-Toh et al., Nature


34
438: 803 (2005), fig. 10
Tree roots

The root of a phylogenetic tree represents the


common ancestor of the sequences. Some trees
are unrooted, and thus do not specify the common
ancestor.

A tree can be rooted using an outgroup (that is, a


taxon known to be distantly related from all other
OTUs- operational taxonomic units ).

35
Page 368
Tree nomenclature: roots

past
9
1
7 8 5
6
7 8
present 2 3 4 2
6
5 3 4
1

Rooted tree Unrooted tree


(specifies evolutionary
path)
Fig.36 11.6
Page 368
Tree nomenclature: outgroup rooting

past root
9
10
7 8
7 9
6 8

present 2 3 4 2 3
4

5
1 5 6
1
Outgroup
(used to place the root)
Rooted tree

Fig.37 11.6
Page 368
Species trees versus gene/protein trees

Molecular evolutionary studies can be complicated


by the fact that both species and genes evolve.
speciation usually occurs when a species becomes
reproductively isolated. In a species tree, each
internal node represents a speciation event.

38
Page 370
Goals of the lecture

Introduction to evolution and phylogeny

Nomenclature of trees

Five stages of molecular phylogeny:


[1] selecting sequences
[2] multiple sequence alignment
[3] models of substitution
[4] tree-building
[5] tree evaluation

39
Stage 1: Use of DNA, RNA, or protein

For some phylogenetic studies, it may be preferable


to use protein instead of DNA sequences.

We saw that in pairwise alignment and in BLAST


searching, protein is often more informative than DNA.
Proteins have 20 states (amino acids) instead of only
four for DNA, so there is a stronger phylogenetic signal.

40
Page 371
Stage 1: Use of DNA, RNA, or protein

For phylogeny, DNA can be more informative.

--The protein-coding portion of DNA has synonymous


and nonsynonymous substitutions. Thus, some DNA
changes do not have corresponding protein changes.

41
Page 371
Goals of the lecture

Introduction to evolution and phylogeny

Nomenclature of trees

Five stages of molecular phylogeny:


[1] selecting sequences
[2] multiple sequence alignment
[3] models of substitution
[4] tree-building
[5] tree evaluation

42
Stage 2: Multiple sequence alignment

The fundamental basis of a phylogenetic tree is


a multiple sequence alignment.

43
Page 375
Stage 2: Multiple sequence alignment

[1] Confirm that all sequences are homologous

[2] Adjust gap creation to optimize the alignment

[3] Restrict phylogenetic analysis to regions of the multiple


sequence alignment for which data are available for all
taxa (delete columns having incomplete data).

[4] Many experts recommend that you delete any


column of an alignment that contains gaps
(even if the gap occurs in only one taxon)

44
Page 375
Goals of the lecture

Introduction to evolution and phylogeny

Nomenclature of trees

Five stages of molecular phylogeny:


[1] selecting sequences
[2] multiple sequence alignment
[3] models of substitution
[4] tree-building
[5] tree evaluation

45
Stage 3: Tree-building models: distance

The simplest approach to measuring distances


between sequences is to align pairs of sequences, and
then to count the number of differences. The degree of
divergence is called the Hamming distance. For an
alignment of length N with n sites at which there are
differences, the degree of divergence D is:

D=n/N

46
Page 378
Goals of the lecture

Introduction to evolution and phylogeny

Nomenclature of trees

Five stages of molecular phylogeny:


[1] selecting sequences
[2] multiple sequence alignment
[3] models of substitution
[4] tree-building
[5] tree evaluation

47
Stage 4: Tree-building methods

Distance-based methods involve a distance metric,


such as the number of amino acid changes between
the sequences, or a distance score. Examples of
distance-based algorithms are UPGMA and
neighbor-joining.

Character-based methods include maximum parsimony


and maximum likelihood. Parsimony analysis involves
the search for the tree with the fewest amino acid
(or nucleotide) changes that account for the observed
differences between taxa.

48
Page 377
Stage 4: Tree-building methods

We can introduce distance-based and character-based


tree-building methods by referring to a tree of 13
orthologous retinol-binding proteins, and the
multiple sequence alignment from which the tree
was generated.

49
Page 378
common carp
Orthologs:
members of a
zebrafish
gene (protein)
family in various
rainbow trout organisms.
This tree shows
teleost
RBP orthologs.

African
clawed
frog
chicken

human
mouse
horse rat
pig cow rabbit

50
10 changes Page 43
common carp

zebrafish

Fish RBP
rainbow trout
orthologs
teleost

African
clawed
frog
chicken Other vertebrate
RBP orthologs
human
mouse
horse rat
pig cow rabbit

51
10 changes Page 43
Fig.5211.13
Page 376
Distance-based tree
Calculate the pairwise alignments;
if two sequences are related,
put them next to each other on the tree
Fig.5311.13
Page 376
Character-based tree: identify
positions that best describe how
characters (amino acids) are derived
from common ancestors Fig.5411.13
Page 376
Stage 4: Tree-building methods

Regardless of whether you use distance- or


character-based methods for building a tree,
the starting point is a multiple sequence alignment.

ReadSeq is a convenient web-based program that


translates multiple sequence alignments into
formats compatible with most commonly used
phylogeny programs such as PAUP and PHYLIP.

55
Page 378
http://evolution.genetics.washington.edu/phylip/software.html

This site lists 200 phylogeny packages. Perhaps the best-


known programs are PAUP (David Swofford and colleagues)
and PHYLIP (Joe Felsenstein). 56
Goals of the lecture

Introduction to evolution and phylogeny

Nomenclature of trees

Five stages of molecular phylogeny:


[1] selecting sequences
[2] multiple sequence alignment
[3] models of substitution
[4] tree-building
[5] tree evaluation

57
Stage 5: Evaluating trees

The main criteria by which the accuracy of a


phylogenetic tree is assessed are consistency,
efficiency, and robustness. Evaluation of accuracy
can refer to an approach (e.g. UPGMA) or
to a particular tree.

58
Page 386

You might also like