Review of Hypothesis Testing and Basic Tests 1. 2-2 2. 2-15: 2-1 © 2006 A. Karpinski
Review of Hypothesis Testing and Basic Tests 1. 2-2 2. 2-15: 2-1 © 2006 A. Karpinski
Page
Appendix
A. Interesting and useful facts about the Chi-square distribution 2-69
This fact presents a problem for us. Consider the speeding example. Suppose
we want to know if, on average, people in this sample are speeding. In our
sample of 25, we found the average speed to be 62.5 MPH. Although this
number is larger than 55 MPH, there are two reasons why this sample mean
could be larger than 55MPH
o The true value speed in the population is greater than 55 MPH. In other
words, we sampled from a distribution that had a mean value of speed
greater than 55.
If this is the case, then we should conclude that on average this
population of drivers violates the speed limit
If we repeated the sampling process, it is likely we would again find a
sample mean greater than 55MPH
Or more generally,
H 0 : Group1 = ... = Groupm
H 1 : At least one i differs from the other means
o Be sure to always state the null and alternative hypotheses in terms of the
population parameters, , , , and NOT the sample statistics, X , s, r
o All hypothesis testing starts with the assumption that the null hypothesis
is true. It is the hypothesis of no association, or that the groups came
from the same underlying distribution.
o There are three possible explanations for any differences that might be
observed:
1. An all systematic factors explanation
2. An all chance explanation
3. A chance + systematic factors explanation
o Suppose we flip the coin one time and get a heads. Are we confident that
the coin is biased toward heads?
No! If the null hypothesis is true, then the coin had a 50% probability
of coming up heads. If we rejected the null hypothesis then when the
null hypothesis is correct, we would be wrong 50% of the time!
o Suppose we flip the coin two times and get two heads. Are we confident
that the coin is biased toward heads?
N
p( x) = p x 1 p ( N x )
x
p(0;2;.5) = .25
p(1;2;.5) = .50
p(2;2;.5) = .25
If the null hypothesis is true, then the coin had a 25% probability of
coming up heads on both tosses. If we rejected the null hypothesis
then when the null hypothesis is correct, we would be wrong 25% of
the time!
p(0;3;.5) = .125
p(1;3;.5) = .375
p(2;3;.5) = .375
p(3;3;.5) = .125
If the null hypothesis is true, then the coin had a 12.5% chance of
coming up heads on all three tosses. If we rejected the null hypothesis
then when the null hypothesis is correct, we would be wrong 12.5% of
the time. Id still feel a bit uneasy about rejecting the null hypothesis
in this case
o Suppose we flip the coin four times and get four heads. Are we confident
that the coin is biased toward heads?
p(0;4;.5) = .0625
p(1;4;.5) = .25
p(2;4;.5) = .375
p(3;4;.5) = .25
p(4;4;.5) = .0625
If the null hypothesis is true, then the coin had a 6.25% chance of
coming up heads on all four tosses. If we rejected the null hypothesis
then when the null hypothesis is correct, we would be wrong 6.25% of
the time.
Perhaps that might be good enough for us to conclude that the coin is
biased. However, the scientific convention has been that when the
null hypothesis is true you need to have a probability of .05 or less so
that the observed result (or a more extreme result) could be due to
chance alone
If the null hypothesis is true (the coin is fair), then the probability of
observing 8 or 9 heads in 9 coin flips is:
p( x = 8 or x = 9) = .0176 + .0020 = .0195
If the null hypothesis is true (the coin is fair), then the probability of
observing 7 or more heads in 9 coin flips is:
p( x = 7 or x = 8 or x = 9) = .0703 + .0176 + .0020 = .0898
For a one tailed test, we only looked at the biased toward heads end of
the distribution:
p ( x 8 ) = .0195
p( x 7) = .0898
For a two tailed test, we would also need to consider the possibility
that the coin was biased toward tails:
p(0 x and x 9 ) = .0039
p(1 x and x 8 ) = .0390
p(2 x and x 7 ) = .1797
Decision
State of the world Reject Fail to Reject
Null Hypothesis Null Hypothesis
Null Hypothesis Type 1 Error Correct Decision
TRUE Probability = Probability = 1 -
H0 H1
Sampling
Sampling Distribution:
Distribution: H1 True
H0 True
0 1
Critical Value for
Rejecting H0
Probability
of aType 1
Error
/2 /2
0 1
-Crit Val Crit Val
0 1
-Crit Val Crit Val
Power
1-
0 1
-Crit Val Crit Val
Of course, knowing that you had low power after you ran the test does little
good . . .
Note that in this question, the sample is drawn from a known population. In
other words, before the data were collected, we already knew the distribution
of the population. In order to use a z-test:
o The population must be normally distributed
o The population parameters must be known in advance of the study
o The observations must be independent and randomly drawn from the
population
observed expected
test obs =
standard error
o For the denominator of the test, we need to calculate the standard error of
the observed sample mean. But we have already done so! We know that
the estimated standard error of a sample mean is
X =
N
o Putting both pieces together, we arrive at the test statistic for the one-
sample z-test:
X
z obs =
N
Example 1a: First, lets conduct a one tailed test that the sample of high-
school seniors have a lower score than the national average.
X 72.5 75 2.5
z obs = = = = 1.71
16 1.46
N 120
o Conclude that this sample of high school seniors has significantly lower
scores than the population of high school seniors.
o Conclude that we do not have enough evidence to claim that this sample
of high school seniors is higher than the population of high school
seniors.
72.5 75 2.5
zobs = = = 1.71
16 1.46
120
o Make decision
z obs = 1.71 < 1.96 = z crit
Fail to reject / Retain null hypothesis
o Conclude that we do not have enough evidence to claim that this sample
of high school seniors differs from the population of high school seniors.
A review of the difference between a one tailed and two tailed test:
72.5 75 2.5
zobs = = = 1.98
16 1.26
160
o Make decision
z obs = 1.98 > 1.96 = z crit
Reject null hypothesis
The p-value is a function of the difference between means AND the sample
size.
Wouldnt it be nice if we had a measure that did not depend on the sample
size?
M1 M2
In general, d =
pooled
X
For a one-sample z-test, d =
o To interpret d:
d=0.20 small effect
d=0.50 medium effect
d=0.80 large effect
o Example 1d
One-sample test with N = 160
x 72.5 75
d= = = .16
16
o In general, a CI is determined by
o How did we arrive at this formula? Lets develop some intuition about
this formula
For a z-test, we have the following formula for our test statistic:
X
z obs =
N
At the exact point of rejection, the observed test statistic will equal the
critical value
Estimate Null
test crit =
Standard error of estimate
But for a two-tailed test, the critical value can be positive or negative
We can combine these two formula to obtain the general formula for a
confidence interval
X obs z crit 72.5 (1.96 1.46) 72.5 2.86 (69.64,75.36)
N
OR
(X obs 0 ) z crit (72.5 75) (1.96 1.46) 2.5 2.86 (5.36,0.36)
N
Interpretation of CIs:
o Example 1d:
One-sample, two-tailed hypothesis with N = 160
z crit = 1.96
X obs z crit 72.5 (1.96 1.26) 72.5 2.47 (70.03,74.97)
N
The z-test is used when the data are known to come from a normal
distribution with a known variance. But in practice, we often have no way of
knowing the population parameters in advance. Thus we have to rely on
estimates of the population parameters, and the t-test.
= s
i
2 2
=
N 1
(x X)
2
i
N 1
X = =
N N
N N
d=g =g
df N 1
Hours of TV viewing
21.96 19.38 23.69 26.11 18.82
22.81 21.98 25.79 21.67 24.35
28.18 18.69 21.23 18.37 25.60
23.87 25.11 24.23 20.90 19.51
22.65 20.90 21.20 28.04 16.77
25.39 26.89 21.61 20.14 20.75
23.81 21.74 23.68 23.80 21.40
18.36 24.12 25.40 23.36 26.46
20.20 20.82 21.11 20.76 23.16
22.69 24.51 25.21 24.50 14.68
Based on these data, can we claim that television viewing patterns have
changed since 1995?
Always look at the data before jumping into the hypothesis test!
Assumptions we need to check include
o Are the data normally distributed (or at least symmetrical)?
o Are there any outliers?
EXAMINE VARIABLES=hours
/PLOT BOXPLOT HISTOGRAM.
30
Histogram
10
28
26
8
24
6
22
20 4
18
Frequency
2
16
50
0
14
15.0 17.0 19.0 21.0 23.0 25.0 27.0
12 16.0 18.0 20.0 22.0 24.0 26.0 28.0
N= 50
HOURS HOURS
o Thus, we need to check if the data are symmetrical around the mean
Is the boxplot symmetrical?
Is the median in the center of the box?
Do the whiskers extend equally in each direction?
Does the histogram look symmetrical?
Is the mean approximately equal to the median?
Is the coefficient of skewness relatively small?
o We should also be on the look out for outliers observations that fall far
from the main distribution of the data. We would not like our
conclusions to be influenced by one point, (or a small number of points).
We should not toss out the outliers, but we do need to keep track of them
Americans watch more TV per week now than they did in 1995
o Make decision
t obs = 3.25 > 2.01 = t crit
Reject null hypothesis
X X 22.53 21.22
g= = = = .46
s 2.85
50
d = .46 = .47
49
2.85
X obs t crit 22.53 2.01 22.53 .810 (21.72,23.34)
N 50
T-TEST /TESTVAL=21.22
/VARIABLES=hours.
One-Sample Statistics
Std. Error
N Mean Std. Deviation Mean
HOURS 50 22.5272 2.84866 .40286
One-Sample Test
SPSS computes CIs around the difference between the sample mean
and the hypothesized value. If you want a CI around the sample
mean, you need to add the (null) hypothesized value to each endpoint
of the CI
(21.72, 23.34)
EXAMINE VARIABLES=mph
/PLOT BOXPLOT HISTOGRAM.
90
Histogram
6
80 5
4
70
60
2
Frequency
50 1 Std. Dev = 11.36
Mean = 62.5
0 N = 25.00
MPH MPH
Descriptives
Std. Error
N Mean Std. Deviation Mean
MPH 25 62.5200 11.35826 2.27165
One-Sample Test
Test Value = 55
95% Confidence
Interval of the
Mean Difference
t df Sig. (2-tailed) Difference Lower Upper
MPH 3.310 24 .003 7.5200 2.8315 12.2085
x 62.52 55 N 25
g= = = .66 d=g = .66 = .67
11.36 df 24
t~
(X- X 2 ) ( 1 2 )
1
Var ( X ) Var ( X )
Var(X 1-X 2 )= +
n1 n2
1 1
= Var ( X ) +
n1 n 2
o This equation does not look so bad. But we will have to remember these
two assumptions we made!
(n1 1) s12 + (n 2 1) s 22
Var ( X ) =
(n1 1) + (n 2 1)
(n1 1) s12 + (n 2 1) s 22
s pooled =
(n1 1) + (n 2 1)
SS1 + SS 2
s pooled =
(n1 + n2 2)
estimate
t obs =
std error of estimate
X1 X 2
=
1 1
s pooled +
n1 n2
df = n1 + n 2 2
2t 2t
g= d=
N df
1 1
X 1 X 2 t crit s pooled +
n1 n2
3.0
4
2.5
3
2.0
1.5
2
1.0
1
.5 Std. Dev = 9.06 Std. Dev = 9.78
Mean = 48.9 Mean = 46.3
0.0 N = 12.00 0 N = 12.00
30.0 35.0 40.0 45.0 50.0 55.0 60.0 25.0 30.0 35.0 40.0 45.0 50.0 55.0 60.0
NORMAL NO_SLEEP
70
60
50
40
30
MEMORY
20
N= 12 12
1.00 2.00
GROUP
Descriptives
Descriptives
GROUP Statistic Std. Error
GROUP Statistic Std. Error MEMORY 2.00 Mean 46.0000 2.89200
MEMORY 1.00 Mean 48.9167 2.61539
95% Confidence Lower Bound 39.6348
95% Confidence Lower Bound 43.1602
Interval for Mean Upper Bound
Interval for Mean Upper Bound 52.3652
54.6731
5% Trimmed Mean
5% Trimmed Mean 46.5000
49.1296
Median Median 48.5000
49.5000
Variance 82.083 Variance 100.364
Std. Deviation 9.05999 Std. Deviation 10.01817
Minimum 32.00 Minimum 25.00
Maximum 62.00 Maximum 58.00
Range 30.00 Range 33.00
Interquartile Range 12.2500 Interquartile Range 16.0000
Skewness -.601 .637 Skewness -.697 .637
Kurtosis -.255 1.232 Kurtosis .048 1.232
Group Statistics
Std. Error
GROUP N Mean Std. Deviation Mean
MEMORY 1.00 12 48.9167 9.05999 2.61539
2.00 12 46.0000 10.01817 2.89200
t obs =
(X C X SD )
=
(48.92 46.00) =
2.92
= 0.748
1 1 1 1 3.899
s pooled + 9.5509 +
nC n SD 12 12
If the null hypothesized value is not zero, then you will need to adjust
the observed t-statistic by hand (SPSS assumes the null hypothesis is
that the two means are equal). Alternatively, you can use EXCEL,
which allows you to enter a null value.
t-Test: Two-Sample Assuming Equal Variances
Variable 1Variable 2
Mean 48.91667 46
Variance 82.08333 100.3636
Observations 12 12
Pooled Variance 91.22348
Hypothesized Mean Difference 0
df 22
t Stat 0.748013
P(T<=t) one-tail 0.231187
t Critical one-tail 1.717144
P(T<=t) two-tail 0.462374
t Critical two-tail 2.073875
Using EXCEL also gives you the pooled variance, s 2pooled , which we
can use to compute the effect size g.
o Make decision
p obs = .462 > .05 = p crit =
t obs (22) = .748 < 2.07 = t crit (22)
X1 X 2 2.92 .748
g= = = .306 r= = .157
s pooled 9.55 .748 2 + 22
2t 2 * .748
g= = = .306
N 24
o You have two (or three) choices of how to display the variability in the
data
The mean displayed in the center of the confidence interval
The mean + 2 standard errors of the mean
50 50
Mean +- 2 SE MEMORY
95% CI MEMORY
40 40
30
30
N= 12 12
N= 12 12
Control No Sleep
Control No Sleep
GROUP
GROUP
As the degrees of freedom get larger, for a two-tailed test with = .05 ,
t crit 1.96 .
Thus for large samples, the two produce nearly identical results.
Amount of Recall by
Experimental Condition
60
55
50
Recall
45
40
35
Normal Sleep Deprived
Experimental Group
60.00
A A
A
A A
A
A
A
50.00 A
A
A A
memory
A
A
A
40.00 A
A
A
30.00
0.00 1.00
group
Confidence Coverage
(Should be 95%)
0.00 2.00042 95.3
0.01 2.04088 91.5
0.025 2.09849 82.7
0.05 2.20703 56.3
0.10 2.29815 18.5
Group 1 Group 2
3 2
4 3
5 4
4 3
3 2
4 3
5 4
4 3
3 2
4 11
o Group 1 is always 1 unit larger than Group 2 except for one observation
A
12
20
10.00
10
8 8.00
dv
6
6.00
A
4
4.00 A A
2
A A
DV
0 2.00 A
N= 10 10
group
GROUP
Not only do we have a problem with an outlier, but as is often the case,
outliers lead to other problems as well
o The variances of the two groups are very different
12 = 0.544 22 = 7.122
o Group 2 has a strong positive skew (is non-symmetrical)
Std. Error
GROUP N Mean Std. Deviation Mean
DV 1.00 10 3.9000 .73786 .23333
2.00 10 3.7000 2.66875 .84393
X1 X 2
t obs =
s12 s 22
+
n1 n 2
(n1 1)(n 2 1)
df =
(n2 1)c 2 + (n1 1)(1 c 2 )
s12
n1
c=
s12 s 22
+
n1 n2
o Notes:
If n1 = n2 , then Welchs t will equal the uncorrected t (but the degrees
of freedom of the tests may be different)
If n1 = n2 and if s1 = s2 , then Welchs t will give the exact same result as
the uncorrected t.
T-TEST GROUPS=group
/VARIABLES=DV.
Independent Samples Test
From New
variable variable Label
-------- -------- -----
DV RDV RANK of DV
T-TEST GROUPS=group
/VARIABLES=rdv.
Group Statistics
Std. Error
GROUP N Mean Std. Deviation Mean
RANK of DV 1.00 10 12.80000 4.385582 1.386843
2.00 10 8.20000 6.033241 1.907878
DV
Mann-Whitney U 27.000
Wilcoxon W 82.000
Z -1.821
Ranks
Asymp. Sig. (2-tailed) .069
GROUP N Mean Rank Sum of Ranks Exact Sig. [2*(1-tailed a
.089
DV 1.00 10 12.80 128.00 Sig.)]
2.00 10 8.20 82.00 a. Not corrected for ties.
Total 20 b. Grouping Variable: GROUP
U = 27.00, p = .089
These two methods give nearly identical results. However, you should
report the results of the U test and not the t-test on the ranks.
The outlier does not influence this test very much. This test would be a
reasonable option to analyze this data without tossing the outlier.
Variability IQR
Median Absolute Deviation (MAD)
Descriptives
M-Estimators
GROUP Statistic
Huber's Tukey's DV 1.00 Mean 3.9000
a b
GROUP M-Estimator Biweight Median 4.0000
DV 1.00 3.8546 3.8675
Interquartile Range 1.2500
2.00 3.0377 2.8810
2.00 Mean 3.7000
a. The weighting constant is 1.339. Median 3.0000
b. The weighting constant is 4.685. Interquartile Range 2.0000
To pool the variability estimates, we can adapt the formula for s pooled
(n1 1) s12 + (n2 1) s 22
s 2pooled =
(n1 1) + (n2 1)
o Importantly, in none of these solutions did we toss the bad data point.
=
observed
2
Cells f expected
Season
Spring Summer Fall Winter
Observed 30 40 20 10
N 100
fe = = = 25
a 4
Season
Spring Summer Fall Winter
Expected 25 25 25 25
Observed 30 40 20 10
=
observed
2
Cells f expected
df = a 1
(f f expected )
2
(30 25)2 (40 25)2 (20 25)2 (10 25)2
=
observed
2
= + + + = 20
Cells f expected 25 25 25 25
df = a 1 = 3
Note that our test is not focused enough to permit a more specific
conclusion. For example, we cannot state that admissions are greater
in the fall than in the winter.
Observed Counts
Belief in Afterlife
Gender Yes No
Females 435 147
Males 375 134
RowTotal * ColumnTotal
fe =
N
Expected frequencies:
Belief in Afterlife
Gender Yes No
Females 432 150 582
Males 378 131 509
810 281 1091
=
observed
2
Cells f expected
df = (a 1)(b 1)
(f f expected )
2
9 9 9 9
=
observed
2
= + + + = .173
Cells f expected 432 150 378 131
= 0.173
H 0 = The proportion of people who state baseball is their favorite sport is equal
among men and women
H 1 = The proportion of people who state baseball is their favorite sport is NOT
equal among men and women
Or
o Step 1: Convert the data into count form for a contingency table
Observed
Baseball Favorite?
Gender Yes No
Females 75 175
Males 165 135
Yucky method
We have n=550, so we need to enter 550 rows of data
Simple method
Enter one row for each cell and the count in that cell
0 0 175
0 1 75
1 0 135
1 1 155
Column 1: 0 = Female
1 = Male
Column 2: 0 = No, baseball is not my favorite sport
1 = Yes, baseball is my favorite sport
Column 3: Count associated with the specific cell
VALUE LABELS
gender 0 'Female' 1 'Male'
/baseball 0 'No' 1 'Yes' .
EXECUTE.
WEIGHT BY count .
CROSSTABS
/TABLES=gender BY baseball
/STATISTIC=CHISQ.
Count
BASEBALL
No Yes Total
GENDER Female 175 75 250
Male 135 165 300
Total 310 240 550
Check this table to make sure you have entered the data
correctly!
Chi-Square Tests
Reject null hypothesis and conclude that the proportion of people who
rate baseball as their favorite varied by gender.
34.652
= = .25
550
Party Affiliation
Gender Democrat Independent Republican
Female 279 73 225
Male 165 47 191
Party Affiliation
Gender Democrat Independent Republican
Female 279 73 225 577
(261.4) (70.7) (244.9)
Male 165 47 191 403
(182.6) (49.3) (171.1)
444 120 416 980
RowTotal * ColumnTotal
fe =
N
(Observed Expected)2
=2
= 7.010
Expected
df = (# of rows 1)(# of columns 1)
= (3-1)(2-1) =2
Chi-Square Tests
Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 7.010a 2 .030
Likelihood Ratio 7.003 2 .030
Linear-by-Linear
6.758 1 .009
Association
N of Valid Cases 980
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 49.35.
o For any table larger than 2*2, use a modified , called Cramers phi, C ,
to measure the effect size of X 2
2 7.01
C = = = .08
N ( L 1) 980(2 1)
Party Affiliation
Gender Democrat Republican
Females 279 225
Males 165 191
o For this table, we omit the people who indicated that they were
Independent
Party Affiliation
Non-Independent
Gender Independent (Democrat + Republican)
Females 73 504
Males 47 356
(y ) 2
If Y ~ N ( , ) then over repeated sampling = 2 (1)
2
(y1 ) 2 (y 2 )2
Then z = 2
and z = 2
1
2 2
2
(y i ) 2
If zi2 =
2
n
Then z
i =1
2
i = 2 ( n)
If Y1 ~ 2 ( 1) and Y2 ~ 2 ( 2 )
Then Y1 + Y2 ~ 2 ( 1 + 2 )
In other words, 2 ( 1 ) + 2 ( 2 ) = 2 ( 1 + 2 )
200
100
100
0 N = 1000.00 0 N = 1000.00
0.0 4.0 8.0 12.0 16.0 20.0
0.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10
11 0
12
13
14 0
15
00
00
00
00
00
00
00
00
00
00
.0
.0
.0
.0
.0
.0
2.0 6.0 10.0 14.0 18.0
0
0
0
0
CH2 CH5
120 160
140
100
120
80
100
60 80
60
40
40
0 N = 1000.00 0 N = 1000.00
6.0 10.0 14.0 18.0 22.0 26.0 30.0 34.0 38.0 42.0
1.
3.
5.
7.
9.
11
13
15
17
19
21
23
25
27
0
.0
.0
.0
.0
.0
.0
.0
.0
.0
CH10
CHI