0% found this document useful (0 votes)
12 views

Biostat Lecture

Uploaded by

thuynt.work1601
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Biostat Lecture

Uploaded by

thuynt.work1601
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

ANOVA

• One Way Analysis of Variance


The one way analysis of variance allows us to compare
several groups of observations, all of which are
independent but possibly with a different mean for each
group. A test of great importance is whether or not all the
means are equal.

72
Feeds for chicken
A 180 177 175 170 182 181 177 180 183 185
B 199 203 200 194 195 204 206 207 202 200
C 191 194 201 193 197 195 203 199 199 201 206 197

Comment?

Ma = 179 Mb = 201 Mc = 198

At the beginning, N could be > 12 for each treatment (sum > 30, 40)

How long is your experiment?

You need to prepare samples more than needed.


73
Important to have more than 30 samples

74
What is the best feed?
 Could you use t – test for each 2 feeds, then compare m

 Use another method using variance?

75
Controlled
Factor
3 feeds

Factor

Uncontrolled
Factor
Observed
variations

76
• The same feed, but hens give different eggs

• But all environmental factors are the same: variety, age,


health status…

77
Effect of
Effect of UnC F
controlled F

What can you say about the effect of feeds?

Action of feeds

78
Effect of
Effect of
Uncontrolled
controlled
Factor
Factor

What can you say about the effect of feeds?

No Action of feeds

79
Effect of Effect of
controlled Uncontrolled
Factor Factor

What can you say about the effect of feeds?

Is it possible to quantify these variations, to compare them?


80
Of course, but it is necessary to make an hypothesis.

H0: All levels of treatment belong to the same statistical population

What this means?

The 3 feeds have the same effect on the number of eggs laid

81
We suppose that all observations belong to the same population

The distribution of observations the number of


eggs, follows a normal distribution.

82
How to know: the feed variation ?

We must remove the genetic variation

How ?

Imagine that all the chickens receiving feed A, lay the same
number of eggs

the chickens receiving feed B, lay the same number of eggs

the chickens receiving feed C, lay the same number of eggs

83
The new data table

A 179 179 179 179 179 179 179 179 179 179
B 201 201 201 201 201 201 201 201 201 201
C 198 198 198 198 198 198 198 198 198 198 198 198

Vt = Variance of treatment

N = level of treatment = 3 (action of feeds)


84
How to know: the genetics variation ?

We must remove the treatment variation

How ?

1. Take each level of treatment

2. In each level of treatment, differences between data


express the variation due to genetics.

3. Its expression is the “error variance”

85
Expression of the “error”

86
The Anova table
A 180 177 175 170 182 181 177 180 183 185
B 199 203 200 194 195 204 206 207 202 200
C 191 194 201 193 197 195 203 199 199 201 206 197

Sum of Degrees of Variance F ratio F limite Conclusion


squares freedom SS/DF
(x-X)^2
VT
Vt Vt/Ve table
Verror

87
The Anova table

Sum of Degrees of Variance F ratio F limite Conclusion


squares freedom
VT 3448 31 111.2 Vt/Vr = H0 refuse for
76.7 α = 0.05
Vt 2900 2 1450
Verror 548 29 18.9

88
89
90
• Which feed is the best one?

91
We use the Student test but with a different approximation
of sd

The hypothesis that the 2 levels


The hypothesis is The hypothesis is
belong to the same statistical
refused refused
population is accepted (a = 5 %).

92
Ma = 179 Mb = 201 Mc = 198

Ma = 179 Ma – Mb = 22 Mc – Ma = 18
Significant Significant
different different

Mb = 201 Mb – Mc = 3
No Significant
different

Mc = 198

93
Between A and B
Sd = 1.9
2 sd = 3.8

Ma - Mb = 22

Conclusion: A and B are different

94
Between A and C
Sd = 1.9
2 sd = 3.8

Ma – Mc = 19

Conclusion: A and C are different

95
Between B and C
Sd = 1.9
2 sd = 3.8

Mb – Mc = 3
Conclusion: B and C are the same

Why B and C belong to the same population but Mb


and Mc differ slightly?

96
Block design
Hypothesis: Experiment with one controlled experimental factor but without
control of environmental heterogeneity

uncontrolled factor as temperature, light…

If we have action of block, and also action of level of treatment,

We cannot give interpretation.

97
Block design
1 2 3 4 5 6 7 8 9 10 m
A 180 177 175 170 182 181 177 180 183 185 179
B 199 203 200 194 195 204 206 207 202 200 201
C 191 194 201 193 197 195 203 199 199 201 197
mb 190 191 192 186 191 193 195 195 195 195 192.4

98
1 2 3 4 5 6 7 8 9 10 m
A 180 177 175 170 182 181 177 180 183 185 179
B 199 203 200 194 195 204 206 207 202 200 201
C 191 194 201 193 197 195 203 199 199 201 197
mb 190 191 192 186 191 193 195 195 195 195 192.4

99
Two Way Analysis of Variance
A Two way analysis of variance is a way of studying the effects of two
factors separately (their main effects) and (sometimes) together (their
interaction effect).

m1 m2 m3 m4
n1 n2 n3 n4

one measurement variable and two nominal variables 100


A1 A2
B1 180 191
177 193
175 201
182 193
177 205
180 204
184 199
185 198
B2 199 219
203 221
200 229
195 221
206 233
207 232
206 227
200 226

101
Assumption
• Assumption #1: Your dependent variable should be measured at the continuous level:
revision time (hours), intelligence (IQ score), exam performance (0 to 100), weight (kg)

• Assumption #2: two independent variables should each consist of two or more
categorical, independent groups: gender (2 groups: male or female), ethnicity (3 groups:
Caucasian, African American and Hispanic)

• Assumption #3: You should have independence of observations, which means that there is
no relationship between the observations in each group or between the groups themselves

• Assumption #4: There should be no significant outliers

• Assumption #5: Your dependent variable should be approximately normally distributed


for each combination of the groups of the two independent variables.

• Assumption #6: There needs to be homogeneity of variances for each combination of the
groups of the two independent variables
102
Hypotheses

• H01 hypotheses: the means of are equal for different values


of the first nominal variable;

• H02 hypotheses: the means are equal for different values of


the second nominal variable;

• H03 hypotheses: there is no interaction (the effects of one


nominal variable don't depend on the value of the other
nominal variable).

103
Are whale heavier in early or late mating season and does that
depend on the gender of the whale?

 month in mating season” and “gender of whale” are nominal factors


(independent variable)

 dependent variable – weight

H01: The means of all month groups are equal


H1: The mean of at least one month group is different

H02: The means of the gender groups are equal


H1: The means of the gender groups are different

H03: There is no interaction between the month and gender


104
H1: There is interaction between the month and gender
Before doing experiments, you need to design experiments
and think about how to analyze the data

105
A1 A2
B1 180 191
177 193
175 201
182 193
177 205
180 204
184 199
185 198
B2 199 219
203 221
200 229
195 221
206 233
207 232
206 227
200 226

106
107
215.4
265.07

375.67

5.41
Hyp. Refused
320.5 45.79 3.44 2.5 Hyp. refused

279.5 13.3

108
109
110
111
112
enzyme activity of mannose-6-phosphate isomerase (MPI) and MPI
genotypes in the amphipod crustacean Platorchestia platensis

113
Mann-Whitney test

Non-Parametric Test for Independent Measures Between Two groups,


can be performed on ranked data (equal to parametric t – test)

On non-normally distributed data

19 – 18
20- 19
18 – 17- 22 22 – 21- 19
20
23
Sample 2
Sample 1

114
Are these 2 samples come from the same population with
α = 5% ?
19 – 18
20- 19
22 – 21- 19
18 – 17- 22
20
23
Sample 2
Sample 1

H0: there is no different between the ranks of 2 samples


H1: there is different between the ranks of 2 samples

1 17 18 19 20 22 23

2 18 19 19 20 21 22

115
Mann–Whitney signed-rank test.
Non-parametric statistical hypothesis test for assessing whether two
independent samples of observations have equally large values (n < 30)
20 - 19 19 – 18
22 – 21- 19 Sample 2
Sample 1 18 – 17- 22
23 20
The two samples data are ranked against each other : at first U1 for
sample 1
(1) 17 18 19 20 22 23
(2) 18 19 19 20 21 22
No data (2)
> 23,  u = 0
0 data in (2) > 22 +22
common  u = 0,5
2 data(2) > 20 & 20 common  u = 2 + 0,5 = 2,5
4 Data (2) > 19 and 19 common  u = 4 + 0,5 = 4,5
5 data (2)> 18 and 18 common  u = 5 + 0,5 = 5,5
6 data (2) > 17 u = 6
U1 = u = 19
U1 = 6 + 5,5 + 4,5 + 2,5 + 0,5 = 19 116
… and now U2, for the second sample:
(2) 18 19 19 20 21 22
(1) 17 18 19 20 22 23
1 data(1) > 23 + 22 common
 u = 1,5
2 data(1) > 21  u = 2
2 data(1) > 20 + 20 common  u = 2,5
3 data(1) > 19  u = 3
3 data(1) > 19 + 19 common  u = 3,5
4 data(1) > 18 + 18 common  u = 4,5
U2 = 17

U2 = 4,5 + 3,5 + 3 + 2,5 + 2 + 1,5 = 17 U1 + U2 = 17 + 19 = 36 = n1* n2

117
The U statistic show you how degrees of
overlap in rank between 2 groups

118
Sample 1

U
Sample 2

119

What are the limits of U1 and U2?
Example 2
1 2 3
1 2 3

U1 = 2,5 + 1,5 + 0,5 = 4.5


U2 = 2,5 + 1,5 + 0,5 = 4.5
n1  n2
U1 = U2 =3 3 = 2
2

120
Example 3
1 2 3
4 5 6

U2 = 0 and U1 = n1*n2

Sample 1

U=0

Sample 2

121
Smaller U = Bigger different between groups

Bigger U = Smaller different between groups

122

What are the limits of U1 and U2?
Example 2
1 2 3
1 2 3

U1 = 2,5 + 1,5 + 0,5 = 4.5


U2 = 2,5 + 1,5 + 0,5 = 4.5
n1  n2
U1 = U2 =3 3 = 2
2

The 2 samples are belong to the same population

123
Example 3
1 2 3 U1 = 3 + 3 + 3 = 9
4 5 6 U2 = 0

U2 = 0 and U1 = n1*n2

The 2 samples are different

Sample 1

U=0

Sample 2

124
… and now U2, for the second sample:
(2) 18 19 19 20 21 22
(1) 17 18 19 20 22 23
1 data(1) > 23 + 22 common
 u = 1,5
2 data(1) > 21  u = 2
2 data(1) > 20 + 20 common  u = 2,5
3 data(1) > 19  u = 3
3 data(1) > 19 + 19 common  u = 3,5
4 data(1) > 18 + 18 common  u = 4,5
U2 = 17

U2 = 4,5 + 3,5 + 3 + 2,5 + 2 + 1,5 = 17 U1 + U2 = 17 + 19 = 36 = n1 *n2

125
U1 and U2 are far different from n1  n2
2

Using Mann-Whitney table

Ucrit (α = %) n1  n2
U=0 U=
2
Hypothesis refused. Hypothesis accepted
////////////////////////////////////////////////////////////

126
Let hypothesis : the two samples belong
to the same statistical population
If n < 20

n1 … 5 6 7 …
n2

… … … … …
6 - 5 6 …
… limit value (α = 5 %)… … …
U=0 5 U = 18

////////////////////////////////////////////////////////////

Hypothesis refused Hypothesis accepted

The Hypothesis is accepted with an a risk of 5% 127


U stat A = Sum of rank A – n(n+1)/2

U stat B = Sum of rank B – n(n+1)/2

U stat A = 19 – 6(6+1)/2 = -2

U stat B = 17 – 6(6+1)/2 = -4

U stat = smallest = |2|


U crit = 5 by checking the Mann Whitney table

Reject H0: There is different between the ranks of 2 samples


128
• The Mann–Whitney U-test is limited to nominal
variables (Qualitative data) with only two samples

• It is the non-parametric analogue to two-sample t–test.

• Nominal variables: sex (male or female), genotype (AA,


Aa, or aa), or ankle condition (values are normal, sprained,
torn ligament, or broken).

129
Is there statistical evidence of a difference in APGAR scores in women receiving the
new and enhanced versus usual prenatal care?

U1 = 46.5
U2 = 17.5

U statA = 10.5
U statB = -18.5

Ustat = 10.5

Ucrit = 13

Reject H0, Accept H1: there is significant


different among 2 methods in APGAR score
with risk of 5%

U stat A = Sum of rank A – n(n+1)/2


U stat B = Sum of rank B – n(n+1)/2
130
Practical: Exercises
You analyse the data of an experiment to test the effect of different medium
composition for rice plant development.

1. Which medium is the best suitable for rice plant development?

2. If you test the effect of media in different time period, can you give the same
conclusion as before?

3. Compare 2 first treatments

A B C D E
121 112 117 128 123
126 103 124 130 125
141 122 123 127 115
125 105 115 126 117
118 106 120 128 121
125 112 121 129 119
131
Products produced in chain A, B, and C

A B C
Defective 5 8 9
Non-defective 35 42 51

The proportion of defective items in 3 chain are different or not?

132
Kruskal–Wallis H

• Non-parametric method for testing equality of


population median among groups.

• It is identical to a one-way analysis of variance


with the data replaced by their ranks.

• It is an extension of the Mann–Whitney test to 3 to 7


groups with N < 30
133
Kruskal–Wallis

ANOVA one way:


 one nominal variable and Measurement variable does not
 one measurement variable meet the normality

134
Data

Liver 1 Liver 2 Liver 3

18 15 15

20 16 20

22 17 21

25 21 25

135
H0: there is no different between the ranks of 3 samples

H1: there is different between the ranks of 3 samples

136
You have to control the calcium levels (mg per liter)
in three livers (12 samples)
Data rank Average rank
Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3
15 15 1 and 2 1,5 1,5
16 3 3
17 4 4
18 5 5
20 20 6 and 7 6,5 6,5
21 21 8 and 9 8,5 8,5
22 10 10
25 25 11 and 12 11,5 11,5
Sum 33 17 28

137
Then calculate the relationship

12 r i2
h= n(n  1)
( )  3(n  1)
ni

2 2 2
12 (33  17  28 )  3(13)
h= 12 13 4 4 4 = 2,58

138
The limit value is given by the table of Kruskal–Wallis

Be carefull: every combination of numbers per sample gives a value


limit
Number of samples : 3

Effective sample a=5%


… … …
Effective
in each
4 4 3 5,598
sample 4 4 4 5,692
5 2 1 5,00
H=0 2,58 limite value = 5,70

///////////////////////////////////////////////////
Hypothesis accepted Hypothesis refused 139
Conclusion: There is no significant different
between 3 groups with risk = 5%
Conclusion: Accept H0

H value is followed chisquare table


α = 0.05, DF = 2
Chisquare = 5.99 > 2.58
Conclusion: Accept H0

140
Is there any different between group A, B and C with risk of 5%
12 r i2
h= n(n  1)
( )  3(n  1)
ni

Group A Group B Group C

27 20 28

23 8 27

14 14 3

18 28 23

7 21 28

9 22 6

141
Is there any different between group A, B and C with risk of 5%

Group A Group B Group C

27 20 28

23 8 27

14 14 3

18 28 23

7 21 28

9 22 6

HA = 49.5
HB = 57.5
H = 0.62, Hcrit = 5.8 HC = 64

P = 0.986589 > 0.05


142
143
Check the normality of data

 Simply check based on histogram

 Satisfactory if the data is roughly symmetric

144
 No clear evidence of asymmetric

 Difficult to determine whether the data is


normal or not

 Information on the same measurements


from previous larger scale maybe helped

 Clear asymmetric even


in a small sample

145
Data not normal

 If non-normality due to outlier, should remove

 If normality is double, non-parametric test is safe.

146
 Nominal scales are used for labeling variables, without any quantitative
value. “Nominal” scales could simply be called “labels.”

 None of them have any numerical significance

 Ordinal scales: the order of the values is important and significant, but the
differences between each one is not really known.

 Typically measures of non-numeric concepts like satisfaction, happiness,


discomfort, etc. 147
 Interval scales are numeric scales in which we know both the order and the exact
differences between the values

 The classic example of an interval scale is Celsius temperature because


the difference between each value is the same.

 For example, the difference between 60 and 50 degrees is a measurable


10 degrees, as is the difference between 80 and 70 degrees.

 pH, SAT score (200-800),credit score (30-850)

 Problem: don’t have a “true zero.” For example, there is no such thing as
“no temperature,” at least not with celsius. zero doesn’t mean the absence
of value, but is another number used on the scale, like 0 degrees celsius

 Ratio variable, has all the properties of an interval variable, and also has a clear
definition of 0.0. When the variable equals 0.0, there is none of that variable.
 Examples:
enzyme activity, dose amount, reaction rate, flow rate, concentration,
pulse, weight, length, temperature in Kelvin (0.0 Kelvin really does mean
“no heat”), survival time. 148
149
A shoe company wants to know if three groups of workers have different salaries:

Women Men Minorities


23 45 20
41 55 30
54 60 34
66 70 40
90 72 44

150
A shoe company wants to know if three groups of workers have different salaries:

Women Men Minorities


23 45 20
41 55 30
54 60 34
66 70 40
90 72 44

H = 7.43

151
H
2. In a manufacturing unit, four teams of operators were randomly selected and sent to four
different facilities for machining techniques training. After the training, the supervisor
conducted the exam and recorded the test scores. At 95% confidence level does the scores are
same in all four facilities?

152
H
5.1 5.8
4.7 6.3
5 7.6
4.6 7.3
4.4 7.2
5.4 6.4
4.8 5.7 Sepal.Length in setosa and
virginica species
5.8 6.4
Are they the same or different? Α
5.4 7.7
= 0.05
5.7 6
5.4 5.6
4.6 6.3
4.8 7.2
5 6.1
5.2 7.2
4.8 7.9
5.2 6.3
4.9 7.7
5.5 6.4
4.4 6.9
5 6.9
153
4.4 6.8 T
The students are randomly assigned to use one of three studying techniques for the next three
weeks to prepare for an exam. At the end of the three weeks, all of the students take the same
test.
The test scores for the students are shown below:

F stat = 1.91
154
A
Is there any difference in the results of treatment A and B? α = 5%

U stat A = Sum of rank A – n(n+1)/2

U stat B = Sum of rank B – n(n+1)/2

U stat = smallest = |U1, U2|

155
U
Researchers want to know if a fuel treatment leads to a change in the average miles
per gallon of a car. To test this, they conduct an experiment in which they measure the
miles per gallon of 12 cars with the fuel treatment and 12 cars without it.

156
The end

Thank you for your attention

157
https://www.socscistatistics.com/tests/anova/default2.aspx

158

You might also like