Biostat Lecture
Biostat Lecture
72
Feeds for chicken
A 180 177 175 170 182 181 177 180 183 185
B 199 203 200 194 195 204 206 207 202 200
C 191 194 201 193 197 195 203 199 199 201 206 197
Comment?
At the beginning, N could be > 12 for each treatment (sum > 30, 40)
74
What is the best feed?
Could you use t – test for each 2 feeds, then compare m
75
Controlled
Factor
3 feeds
Factor
Uncontrolled
Factor
Observed
variations
76
• The same feed, but hens give different eggs
77
Effect of
Effect of UnC F
controlled F
Action of feeds
78
Effect of
Effect of
Uncontrolled
controlled
Factor
Factor
No Action of feeds
79
Effect of Effect of
controlled Uncontrolled
Factor Factor
The 3 feeds have the same effect on the number of eggs laid
81
We suppose that all observations belong to the same population
82
How to know: the feed variation ?
How ?
Imagine that all the chickens receiving feed A, lay the same
number of eggs
83
The new data table
A 179 179 179 179 179 179 179 179 179 179
B 201 201 201 201 201 201 201 201 201 201
C 198 198 198 198 198 198 198 198 198 198 198 198
Vt = Variance of treatment
How ?
85
Expression of the “error”
86
The Anova table
A 180 177 175 170 182 181 177 180 183 185
B 199 203 200 194 195 204 206 207 202 200
C 191 194 201 193 197 195 203 199 199 201 206 197
87
The Anova table
88
89
90
• Which feed is the best one?
91
We use the Student test but with a different approximation
of sd
92
Ma = 179 Mb = 201 Mc = 198
Ma = 179 Ma – Mb = 22 Mc – Ma = 18
Significant Significant
different different
Mb = 201 Mb – Mc = 3
No Significant
different
Mc = 198
93
Between A and B
Sd = 1.9
2 sd = 3.8
Ma - Mb = 22
94
Between A and C
Sd = 1.9
2 sd = 3.8
Ma – Mc = 19
95
Between B and C
Sd = 1.9
2 sd = 3.8
Mb – Mc = 3
Conclusion: B and C are the same
96
Block design
Hypothesis: Experiment with one controlled experimental factor but without
control of environmental heterogeneity
97
Block design
1 2 3 4 5 6 7 8 9 10 m
A 180 177 175 170 182 181 177 180 183 185 179
B 199 203 200 194 195 204 206 207 202 200 201
C 191 194 201 193 197 195 203 199 199 201 197
mb 190 191 192 186 191 193 195 195 195 195 192.4
98
1 2 3 4 5 6 7 8 9 10 m
A 180 177 175 170 182 181 177 180 183 185 179
B 199 203 200 194 195 204 206 207 202 200 201
C 191 194 201 193 197 195 203 199 199 201 197
mb 190 191 192 186 191 193 195 195 195 195 192.4
99
Two Way Analysis of Variance
A Two way analysis of variance is a way of studying the effects of two
factors separately (their main effects) and (sometimes) together (their
interaction effect).
m1 m2 m3 m4
n1 n2 n3 n4
101
Assumption
• Assumption #1: Your dependent variable should be measured at the continuous level:
revision time (hours), intelligence (IQ score), exam performance (0 to 100), weight (kg)
• Assumption #2: two independent variables should each consist of two or more
categorical, independent groups: gender (2 groups: male or female), ethnicity (3 groups:
Caucasian, African American and Hispanic)
• Assumption #3: You should have independence of observations, which means that there is
no relationship between the observations in each group or between the groups themselves
• Assumption #6: There needs to be homogeneity of variances for each combination of the
groups of the two independent variables
102
Hypotheses
103
Are whale heavier in early or late mating season and does that
depend on the gender of the whale?
105
A1 A2
B1 180 191
177 193
175 201
182 193
177 205
180 204
184 199
185 198
B2 199 219
203 221
200 229
195 221
206 233
207 232
206 227
200 226
106
107
215.4
265.07
375.67
5.41
Hyp. Refused
320.5 45.79 3.44 2.5 Hyp. refused
279.5 13.3
108
109
110
111
112
enzyme activity of mannose-6-phosphate isomerase (MPI) and MPI
genotypes in the amphipod crustacean Platorchestia platensis
113
Mann-Whitney test
19 – 18
20- 19
18 – 17- 22 22 – 21- 19
20
23
Sample 2
Sample 1
114
Are these 2 samples come from the same population with
α = 5% ?
19 – 18
20- 19
22 – 21- 19
18 – 17- 22
20
23
Sample 2
Sample 1
1 17 18 19 20 22 23
2 18 19 19 20 21 22
115
Mann–Whitney signed-rank test.
Non-parametric statistical hypothesis test for assessing whether two
independent samples of observations have equally large values (n < 30)
20 - 19 19 – 18
22 – 21- 19 Sample 2
Sample 1 18 – 17- 22
23 20
The two samples data are ranked against each other : at first U1 for
sample 1
(1) 17 18 19 20 22 23
(2) 18 19 19 20 21 22
No data (2)
> 23, u = 0
0 data in (2) > 22 +22
common u = 0,5
2 data(2) > 20 & 20 common u = 2 + 0,5 = 2,5
4 Data (2) > 19 and 19 common u = 4 + 0,5 = 4,5
5 data (2)> 18 and 18 common u = 5 + 0,5 = 5,5
6 data (2) > 17 u = 6
U1 = u = 19
U1 = 6 + 5,5 + 4,5 + 2,5 + 0,5 = 19 116
… and now U2, for the second sample:
(2) 18 19 19 20 21 22
(1) 17 18 19 20 22 23
1 data(1) > 23 + 22 common
u = 1,5
2 data(1) > 21 u = 2
2 data(1) > 20 + 20 common u = 2,5
3 data(1) > 19 u = 3
3 data(1) > 19 + 19 common u = 3,5
4 data(1) > 18 + 18 common u = 4,5
U2 = 17
117
The U statistic show you how degrees of
overlap in rank between 2 groups
118
Sample 1
U
Sample 2
119
What are the limits of U1 and U2?
Example 2
1 2 3
1 2 3
120
Example 3
1 2 3
4 5 6
U2 = 0 and U1 = n1*n2
Sample 1
U=0
Sample 2
121
Smaller U = Bigger different between groups
122
What are the limits of U1 and U2?
Example 2
1 2 3
1 2 3
123
Example 3
1 2 3 U1 = 3 + 3 + 3 = 9
4 5 6 U2 = 0
U2 = 0 and U1 = n1*n2
Sample 1
U=0
Sample 2
124
… and now U2, for the second sample:
(2) 18 19 19 20 21 22
(1) 17 18 19 20 22 23
1 data(1) > 23 + 22 common
u = 1,5
2 data(1) > 21 u = 2
2 data(1) > 20 + 20 common u = 2,5
3 data(1) > 19 u = 3
3 data(1) > 19 + 19 common u = 3,5
4 data(1) > 18 + 18 common u = 4,5
U2 = 17
125
U1 and U2 are far different from n1 n2
2
Ucrit (α = %) n1 n2
U=0 U=
2
Hypothesis refused. Hypothesis accepted
////////////////////////////////////////////////////////////
126
Let hypothesis : the two samples belong
to the same statistical population
If n < 20
n1 … 5 6 7 …
n2
… … … … …
6 - 5 6 …
… limit value (α = 5 %)… … …
U=0 5 U = 18
////////////////////////////////////////////////////////////
U stat A = 19 – 6(6+1)/2 = -2
U stat B = 17 – 6(6+1)/2 = -4
129
Is there statistical evidence of a difference in APGAR scores in women receiving the
new and enhanced versus usual prenatal care?
U1 = 46.5
U2 = 17.5
U statA = 10.5
U statB = -18.5
Ustat = 10.5
Ucrit = 13
2. If you test the effect of media in different time period, can you give the same
conclusion as before?
A B C D E
121 112 117 128 123
126 103 124 130 125
141 122 123 127 115
125 105 115 126 117
118 106 120 128 121
125 112 121 129 119
131
Products produced in chain A, B, and C
A B C
Defective 5 8 9
Non-defective 35 42 51
132
Kruskal–Wallis H
134
Data
18 15 15
20 16 20
22 17 21
25 21 25
135
H0: there is no different between the ranks of 3 samples
136
You have to control the calcium levels (mg per liter)
in three livers (12 samples)
Data rank Average rank
Liver 1 Liver 2 Liver 3 Liver 1 Liver 2 Liver 3
15 15 1 and 2 1,5 1,5
16 3 3
17 4 4
18 5 5
20 20 6 and 7 6,5 6,5
21 21 8 and 9 8,5 8,5
22 10 10
25 25 11 and 12 11,5 11,5
Sum 33 17 28
137
Then calculate the relationship
12 r i2
h= n(n 1)
( ) 3(n 1)
ni
2 2 2
12 (33 17 28 ) 3(13)
h= 12 13 4 4 4 = 2,58
138
The limit value is given by the table of Kruskal–Wallis
///////////////////////////////////////////////////
Hypothesis accepted Hypothesis refused 139
Conclusion: There is no significant different
between 3 groups with risk = 5%
Conclusion: Accept H0
140
Is there any different between group A, B and C with risk of 5%
12 r i2
h= n(n 1)
( ) 3(n 1)
ni
27 20 28
23 8 27
14 14 3
18 28 23
7 21 28
9 22 6
141
Is there any different between group A, B and C with risk of 5%
27 20 28
23 8 27
14 14 3
18 28 23
7 21 28
9 22 6
HA = 49.5
HB = 57.5
H = 0.62, Hcrit = 5.8 HC = 64
144
No clear evidence of asymmetric
145
Data not normal
146
Nominal scales are used for labeling variables, without any quantitative
value. “Nominal” scales could simply be called “labels.”
Ordinal scales: the order of the values is important and significant, but the
differences between each one is not really known.
Problem: don’t have a “true zero.” For example, there is no such thing as
“no temperature,” at least not with celsius. zero doesn’t mean the absence
of value, but is another number used on the scale, like 0 degrees celsius
Ratio variable, has all the properties of an interval variable, and also has a clear
definition of 0.0. When the variable equals 0.0, there is none of that variable.
Examples:
enzyme activity, dose amount, reaction rate, flow rate, concentration,
pulse, weight, length, temperature in Kelvin (0.0 Kelvin really does mean
“no heat”), survival time. 148
149
A shoe company wants to know if three groups of workers have different salaries:
150
A shoe company wants to know if three groups of workers have different salaries:
H = 7.43
151
H
2. In a manufacturing unit, four teams of operators were randomly selected and sent to four
different facilities for machining techniques training. After the training, the supervisor
conducted the exam and recorded the test scores. At 95% confidence level does the scores are
same in all four facilities?
152
H
5.1 5.8
4.7 6.3
5 7.6
4.6 7.3
4.4 7.2
5.4 6.4
4.8 5.7 Sepal.Length in setosa and
virginica species
5.8 6.4
Are they the same or different? Α
5.4 7.7
= 0.05
5.7 6
5.4 5.6
4.6 6.3
4.8 7.2
5 6.1
5.2 7.2
4.8 7.9
5.2 6.3
4.9 7.7
5.5 6.4
4.4 6.9
5 6.9
153
4.4 6.8 T
The students are randomly assigned to use one of three studying techniques for the next three
weeks to prepare for an exam. At the end of the three weeks, all of the students take the same
test.
The test scores for the students are shown below:
F stat = 1.91
154
A
Is there any difference in the results of treatment A and B? α = 5%
155
U
Researchers want to know if a fuel treatment leads to a change in the average miles
per gallon of a car. To test this, they conduct an experiment in which they measure the
miles per gallon of 12 cars with the fuel treatment and 12 cars without it.
156
The end
157
https://www.socscistatistics.com/tests/anova/default2.aspx
158