Hypothesis Testing
Hypothesis Testing
Notations:
A hypothesis is denoted by H.
Ho=Null Hypothesis (a hypothesis of “no difference” or “no relationship”)
H1 or Ha = Alternative Hypothesis (Research Hypothesis)
Steps:
Formulate the null and alternative hypotheses
Establish the significance level and identify the acceptance and rejection
Select the test statistic and procedure
Collect the sample and compute the value of the test statistic
Make the decision (accept or reject the null hypothesis).
Parametric tests:
To use parametric tests, the following assumptions should hold:
The observations must be independent i.e. the selection of any one case should not affect the
chances for any other case to be included in the sample
Its observations should be drawn from normally distributed populations
These populations should have equal variances
The measurement scales should be at least interval so that arithmetic operations can be used
with them
Note:
Parametric tests are more powerful than non-parametric tests because their data are derived from
interval and ratio measurement.
They are more useful the more you know about your subject matter, since knowledge about
subject matter can be built into parametric models.
1
drawn but only general assumptions, such as a continuous and/or symmetric population
distribution.
An alternative hypothesis that specifies that the parameter can lie on either side of the value
indicated by Ho is called a two-sided or two-tailed test.
Type I error
Committed by rejecting a true null hypothesis (significance level)
Type II error
Committed by failing to reject a false null hypothesis
SELECTING A TEST
PARAMETRIC TESTS
1. One-sample tests
Used when we have a single sample and wish to test the hypothesis that it comes from a specific
population
Following questions are encountered:
Is there a difference between observed frequencies and the frequencies we would expect,
based on some theory?
Is there a difference between observed and expected proportions?
2
Is it reasonable to conclude that a sample is drawn from a population with some specified
distribution? (Normal, Poisson, etc)?
Is there a significant difference between some measures of central tendency ( X ) and its
population parameter (µ)?
Hypotheses
H0 : 0
H1 : 0 OR H1 : 0
P 0
Test Statistic : Z
0 (1 0 )
n
when n 30
P 0
Test Statistic: t
0 (1 0 )
n
when n 30, df n 1
H 0 : 1 2
H1 : 1 2 (two tailed test)
3
H 0 : 1 2
H1 : 1 2 (one tailed test)
Or when 1 2 ,
X1 X 2 S12 (n1 1) S 22 (n2 1)
t Where, S p2
S p2 S p2 n1 n2 2
n1 n2
H0 :1 2
H1 : 1 2 (one tailed test)
Test statistic
4
d
t
Sd / n
Where
S 2
(d d ) , 2
d
d , & d X Y
n 1
d
n
NON-PARAMETRIC TESTS
Chi-square ( tests )
2
Sign-tests (one and two sample tests)
Mann-Whitney tests – to be used with two independent groups (analogous to the independent
groups t-test)
Wilcoxon Matched pair test (to be used with two related groups (i.e. matched or repeated
groups (analogous to the related samples t-test
Kruskall-Wallis – to be used with two or more independent groups (analogous to the one-way
ANOVA)
CHI SQUARE APPLICATIONS
Chi-square Test
This is probably the most widely used non-parametric test of significance
It is particularly useful in tests involving nominal data, but can be used for higher scales.
It is useful in cases of one-sample analysis, two independent samples or k-independent
samples
It must be calculated with actual counts rather than percentages.
Characteristics of the Chi-Square Distribution
It is positively skewed
It is non-negative
It is based on degrees of freedom
When the degrees of freedom change, a new distribution is created
Test of independence (Contingency Table Analysis)
If we classify a population into several categories represented with two (or more attributes) we
can use a chi-square test to determine whether the two attributes are independent of each other.
Contingency table analysis is used to test whether two traits or variables are related
Each observation is classified according to two variables
Hypotheses:
Ho : The two classifications are independent (i.e. no relation between classes)
H1:The classifications are not independent;
The 2 value obtained from the formula is compared with the value from the 2 table for a
given significance level and the number of degrees of freedom.
Goodness-of-Fit Test
In addition to the use of 2 for comparing observed and expected frequencies, the 2 test can be
used to determine how well empirical distribution (i.e. those obtained from a sample data) fit
theoretical distribution such as the normal, Poisson, and Binomial. That is, chi-square test of
"goodness of fit” tests the appropriateness of a distribution.
H 0 : No difference between
H1 : There is a difference between
The test statistic is: 2 (f0 f e )
2
fe
The critical value is a chi-square value with (k-1) degrees of freedom, where k is the number of
categories
The purpose is to test whether the observed frequencies in a frequency distribution match the
theoretical distribution. The expected frequencies are calculated in accordance with the assumed
distribution characteristics based on appropriate tables (Normal, Poisson, etc) and 2 calculated in
the manner previously described.