Statistics
Statistics
2
How can we make these numbers
meaningful?
3
RAW DATA
4
5
Task I. Measuring Arm Span
6
• What do these numbers represent?
• Can we get clear and precise information
immediately as we look at these numbers?
Why?
• How can we make these numbers
meaningful for anyone who does not know
about the description of these numbers?
7
RAW DATA
8
PROCESSING
9
➢ Give some examples of activities which you
think Statistics is involved.
• What is Statistics?
10
What is Statistics?
11
Functions or Uses of Statistics
Statistics helps in …
➢ providing a better understanding and exact
description of a phenomenon of nature.
➢ proper and efficient planning of a statistical
inquiry in any field of study.
➢ collecting an appropriate quantitative data.
➢ presenting complex data in a suitable tabular,
diagrammatic and graphic form for an easy and
clear comprehension of the data.
12
➢ understanding the nature and pattern of
variability of a phenomenon through quantitative
observations.
13
Nature of Statistics
• Descriptive Statistics
➢ Methods concern with describing and
summarizing sets of data.
• Inferential Statistics
➢ Methods that make possible the estimation of a
characteristics of a population or the making of a
decision concerning a population based on
information provided by the sample.
14
Example
Types of Variable
Qualitative / Categorical
- measures a quality or characteristics
Example: Hair Color, NBA Teams, Gender, Course
Section, etc.
Quantitative / Numerical
- measures a numerical quantity or amount
- answers questions “how much” or “how many”
Example: height of a student, weight of babies, etc.
Moderator Variable
Ability
Level
Method Academic
of Performance
Teaching
18
Types of Quantitative Variables
• DISCRETE
➢ Assumes only a finite countable number of
values
• Example:
– number of meals in a day
– Number of units enrolled, etc
• CONTINUOUS
➢ Assumes infinitely many values corresponding to
the points on a line interval
• Example
– Age
– Travel time from home to school
19
Levels of Measurement of Variables
• NOMINAL
➢ Variable whose values are simply labels or
names or categories without any implicit or
explicit ordering of the labels
➢ Lowest level of measurement
• Example:
– Gender,
– ID number, etc.
20
Levels of Measurement of Variables
• ORDINAL
➢ Variables whose values are labels or classes
with an implied ordering in these labels
➢ Ranking can be done on this data
➢ Distance between two levels cannot be
quantified
• Example:
– Job Position,
– Faculty rank, etc.
21
Levels of Measurement of Variables
• INTERVAL
➢ Variables whose values can be ordered, and in
addition, may be added or subtracted, but not
divided nor multiplied
➢ Distances between any two points are of known
size, the unit of measurement is constant (but
arbitrary), and the zero point is arbitrary (not
specified)
• Example:
– Temperature
• RATIO
➢ Variable whose values have all the properties of
the interval scale, and in addition, can be
multiplied and divided
➢ Has a true zero point
➢ Highest level of measurement
• Example:
– length,
– weight,
– Height, etc.
23
Data Collection
24
Data
➢ Measurements of variables from every individual
or object under consideration.
Kinds of Data
• PRIMARY
➢ Data obtained directly from the source of
information.
• SECONDARY
➢ Data obtained have been previously
collected by another person or institution for
some other purposes, taken usually from
publications or existing records.
25
Sampling
26
Population
27
Sample
28
Why do sampling?
29
Determination of sample size
• n = N/(1+Ne2)
32
• Use Slovin’s formula to find out what sample
of a population of 1,000 people you need to
take for a survey on their soda preferences.
Step 1: Figure out what you want
your confidence level to be. For example, you
might want a confidence level of 95 percent
(which will give you a margin error of 0.05), or
you might need better accuracy at the 98
percent confidence level (which produces a
margin of error of 0.02).
33
• Step 2. Plug your data into the formula. In
this example, we’ll use a 95 percent confidence
level with a population size of 1,000.
n = N / (1 + N e2) =1,000 / (1 + 1000 * 0.02 2) =
714.26
• Step 3: Round your answer to a whole
number (because you can’t sample a fraction
of a person or thing)
715
34
Sampling Procedures:
Probability vs. Non-Probability
Sampling
Probability sampling.
With probability sampling, every element of the
population has a known probability of being included in
the sample.
Advantage:
35
Probability vs. Non-Probability
Sampling
Non-probability sampling.
With non-probability sampling, we cannot
specify the probability that each element will be
included in the sample.
36
Methods of Probability Sampling
37
Simple Random Sampling
39
Systematic random sampling
➢This technique of sampling involves the
selection of the desired sample in a list by
arranging them randomly. otherwise if
arranged systematically or logically in either
alphabetical arrangement or any acceptable
arrangement It becomes systematic sampling.
42
Stratified random sampling
43
44
Stratified random sampling
• Proportional Allocation
• Equal allocation
– In this procedure the sample size of each
group/ stratum determined by dividing the
n by the number of strata or subgroups.
Each group/stratum will have equal size
45
Example
• A survey was conducted to find out if families
living in a certain community are in favor of
Charter Change. To ensure that all income
groups are represented, respondents will be
divided into high income (class A), middle
(Class B) and low-income (Class C) groups.
Below is the distribution of income:
46
Advantages and Disadvantages
• Stratified sampling offers several advantages over simple
random sampling.
• A stratified sample can provide greater precision than a
simple random sample of the same size.
• Because it provides greater precision, a stratified sample
often requires a smaller sample, which saves money.
• A stratified sample can guard against an
"unrepresentative" sample
• We can ensure that we obtain sufficient sample points to
support a separate analysis of any subgroup.
• The main disadvantage of a stratified sample is that it
may require more administrative effort than a simple
random sample.
47
Cluster Sampling
48
Steps:
• Identify and define the problem
• Determine the desired sample size
• Identify and define a logical cluster
• List all clusters (or obtain a list) that comprise the
population
• Estimate the average number of population members per
cluster
• Determine the number of clusters needed by dividing the
sample size by the estimated size of cluster
• Randomly select the needed number of clusters (using a
table of random numbers)
• Include in your study all population members in each
population cluster.
49
Let see how a superintendent would
get a sample of teachers if cluster
sampling were used.
• The population is all 5000 teachers in the superintendent’s school
system
• The desired sample size is 500
• A logical cluster is a school
• The superintendent has a list of all schools in the district, there are
100 schools
• Although schools vary in the number of teachers per school, there
is an average of 50 teachers per school
• The number of clusters (schools) needed equals the desired
sample size, 500, divided by the average size of the cluster, 50.
Thus the number of schools needed is 500/50=10
• Therefore, 10 of the 100 schools are randomly selected
• All the teachers in each of the 10 schools are in the sample (10
schools, 50 teachers per school, equals the desired sample size
which is 500.
50
Methods of Non-probability Sampling
• Accidental or Incidental Sampling
➢ Based exclusively on what is convenient for the
researcher, i.e. the researcher includes the most
convenient cases in his/her sample and excludes the
inconvenient cases. There are several techniques that
may be characterized under this, e.g. snowball
sampling.
• Quota Sampling
➢ Example:
• Suppose we are asked to draw a quota sample from the
students attending a university, where 42% are females
and 58% were males. (in this method the researcher is
given a sample with respect to locale, so that 42% of
the samples consist of female and 58% of males. So
that, if the total sample is 200, then we take 84 female
students and 116 male students.
The inadequacy of quota sampling is anchored that lack of control
over factors other that those set in the quota.
51
Source: Levin and Fox, 1997
Methods of Non-probability Sampling
Example:
➢ Predicting outcome of the Tacloban City Mayoralty
Election
• A particular Baranggay who traditionally voted for
winning candidates for the mayoralty office be
considered as the sample.
52
Source: Levin and Fox, 1997
Frequency
Distribution
53
Frequency Distribution
54
Categorical Frequency
Distribution
• The categorical frequency distribution is
used for data that can be placed in
specific categories, such as nominal- or
ordinal-level data.
• For example, data such as political
affiliation, religious affiliation, or major
field of study.
55
Example
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
56
The frequency distribution is
57
Ungrouped Frequency
Distribution
58
Example
59
60
Grouped Frequency Distribution
• When the range of the data is large, the
data must be grouped into classes that are
more than one unit in width.
• To construct a frequency distribution, follow
these rules:
1. There should be between 5 and 20
classes.
2. The class width should be an odd
number. This ensures that the midpoint of
each class has the same place value as the
data.
61
3. The classes must be mutually exclusive.
Mutually exclusive classes have
nonoverlapping class limits so that data cannot
be placed into two classes.
62
Procedure for constructing a
grouped frequency distribution
1. Find the range.
range = highest value – lowest value
2. Decide on the number of class
intervals or classes, we denote it by k.
• Sturge’s Formula: k = 1 + log2N
• another formula: k = n
• 5 – 20 classes
63
3. Determine the class size or class
width of the interval, we denote it by c.
range
c=
k
(rounded to the nearest odd whole number)
UL = LL + (c – 1)
64
5. Determine the upper class intervals by
consecutively adding the class size c to
the values of LL and UL of the lowest
class interval until we get the class
interval with the highest value in the data
set.
6. Tally the data, find the frequencies.
Note: Other statistical information may be
reflected in the table such as class boundaries,
class marks or class midpoints, less than
cumulative frquency (<cf), greater than
cumulative frequency (>cf), and the relative
frequency (rf)
65
• The class boundaries are used to
separate the classes so that there are no
gaps in the frequency distribution.
66
• The class midpoint is found by adding the
upper and lower boundaries (or limits) and
dividing by 2.
67
Example
• Distribution of scores of forty students in
a Mathematics class.
68
1. Find the range = 99-67 = 32
2. no. of class interval k = 𝑛 = 40= 6.32
take 7
32
3. Class size c = = 4.57 , take 5
7
69
Class Class Class Mark Frequency
interval Boundaries (midpoint)
94.5-99.5 97 6
95-99
89.5-94.5 92 0
90-94
84.5-89.5 87 4
85-89
79.5-84.5 82 5
80-84
74.5-79.5 77 1
75-79
69.5-74.5 72
70-74
64.5-69.5 67
65-69
70
Why construct frequency
distribution?
71
Graphs
72
Graphical Presentations of Data
• The three most common statistical
graphs are the bar graph (histogram),
the frequency polygon, and the
cumulative frequency or the ogive.
• The purpose of graphs in statistics is to
convey the data to the viewer in pictorial
form.
• Graphs are useful in getting the
audience’s attention in a publication or a
presentation.
73
Histogram
74
Frequency Polygon
• The frequency polygon is a graph that
displays the data by using lines that
connect points plotted for the
frequencies at the midpoints of the
classes.
75
Ogive
• The ogive is the graph that represents
the cumulative frequencies for the
classes in a frequency distribution.
76
Other Types of Graphs
• Pareto Chart
A Pareto chart is used to represent
a frequency distribution for categorical
variable, and the frequencies are
displayed by the heights of vertical bars,
which are arranged in order from highest
to lowest.
77
Example of a Pareto chart
78
• Pie Chart
A pie chart is a circle that is divided
into sections according to the
percentage of frequencies in each
category of the distribution.
79
Stem-and-Leaf Plot
80
Example
81
Measures of
Central Tendency
82
MEASURE OF CENTRAL TENDENCY
83
ARITHMETIC MEAN
84
POPULATION MEAN
N
xi
= i =1
N
85
SAMPLE MEAN
n
xi
x= i =1
n
86
Example
87
Remark: The mean takes into account all
observations in the data set. Thus, it is
affected by extreme values.
n
xi
5 + 7 + 14 + 4 + 6
x= =
i =1
= 7.2 hours
n 5
88
WEIGHTED MEAN
x1 = 60 n1 = 10
x2 = 50 n2 = 60
x3 = 40 n3 = 30
89
We wish to find the mean of the three
groups combined, denoted by x t
x1 n1 + x 2 n2 + x 3 n3
xt =
n1 + n2 + n3
60 10 + 50 60 + 40 30
xt =
10 + 60 + 30
4800
xt =
100
x t = 48
90
MEAN OF GROUPED DATA
k
f i xi
x = i =1k
fi
i =1
where
fi = frequency of the class interval
xi = class mark of the class interval
91
Example: Given the frequency distribution
table below, find its mean.
19 – 21 3
16 – 18 10
13 – 15 4
10 – 12 12
7–9 6
92
We solve first the class mark and the
product of the class mark and the
frequency.
93
Class Frequency Class Mark
Interval (f) (x) fx
19 – 21 3 20 60
16 – 18 10 17 170
13 – 15 4 14 56
10 – 12 12 11 132
7–9 6 8 48
f=35 fx=466
k
f i xi
466
x= i =1
= = 13.3
k
35
fi
i =1
94
MEDIAN
95
POPULATION MEDIAN
Given the population data arranged
from the lowest to the highest, x1, x2, …,
xN, the population median is given by
Md = x N +1 if N is odd
2
xN +xN
+1
Md = 2 2
if N is even
2
96
SAMPLE MEDIAN
Given the sample data arranged from
the lowest to the highest, x1, x2, …, xn, the
sample median is given by
Md = x n +1 if n is odd
2
xn + xn
+1
Md = 2 2
if n is even
2
97
Steps in finding the median
98
Example: Below are the scores of 6
students in their Mathematics test. Find
the median.
35 20 12 30 25 50
12 20 25 30 35 50
99
Since n=6, the median is the average
of the
n 6 rd n 6
= =3 and + 1 = + 1 = 4
th
2 2 2 2
100
Example: Below are the scores of 7
students in their Mathematics test. Find
the median.
35 20 12 30 25 50 26
12 20 25 26 30 35 50
101
Since n=7, the median is the
n + 1 7 + 1
= =4
th
2 2
102
Remarks:
103
MEDIAN OF GROUPED DATA
N
− cf
Md = L + 2 i
fi
Where
Md = median
L = true lower limit of the median class (interval
containing the score where 50% of the total observations fall)
N = total number of observations
cf = cumulative frequency of the class below the median
class
fi = frequency of the median class
i = class size
104
Example: Given the frequency distribution
below, find its median.
Class
Frequency
Interval
19 – 21 3
16 – 18 10
13 – 15 4
10 – 12 12
7–9 6
105
We identify first the class where half or
less than half of the observations fall.
Class Frequency
<cf
Interval (f)
19 – 21 3 35
16 – 18 10 32
13 – 15 4 22
10 – 12 12 18
7–9 6 6
f=35
106
N
− cf
Md = L + 2
fi
i ()
35
−6
Md = 9.5 + 2
12
()
3
17.5 − 6
Md = 9.5 +
12
()
3
11.5
Md = 9.5 + ()
12
3
Md ( )( )
= 9.5 + 0.96 3
Md = 9.5 + 2.88
Md = 12.38 = 12.4
107
MODE
108
MODE OF GROUPED DATA
109
Example:
10 + 12
Mo = = 11
2
110
“EXACT MODE” OF GROUPED DATA
d
Mo = L + 1
i
d1 + d 2
Where
Mo = mode
L = true lower limit of the modal class (interval
containing the highest frequency)
N = total number of observations
d1 = absolute difference between the modal class and
the class below it
d2 = absolute difference between the modal class and
the class above it
i = class size
111
Example:
Class Interval Frequency
19 – 21 3
16 – 18 10
13 – 15 4
10 – 12 12
7–9 6
d 6
Mo = L + 1
i = 9.5+ 3
d1 + d 2 6+8
= 9.5+1.29 =10.79
112
EMPIRICAL RULE
Mo = 3 Median− 2 Mean
113
Remarks:
➢the mode is easily and readily obtained
for a person who wants a quick measure of
central tendency,
➢unlike the mean and median, the mode
does not always exist,
➢ it is the least reliable of the three
measures of central location
114
Measures of
Relative Position
115
Percentile
116
Example: Below are the scores of 40
students in their Mathematics test. Find P85.
16 26 31 32 34 37 39 43
19 29 31 33 34 37 39 44
22 30 31 33 35 37 41 45
25 30 32 33 35 38 41 47
26 31 32 34 36 38 42 47
117
We seek the value below which
85
40 = 34 observations fall
100
41+ 42
P85 = = 41.5
2
118
Decile
119
Quartile
120
COMPUTATION OF QUANTILES FOR
GROUPED DATA
x% N − cf
Px = L + i
fi
Where
Px = the value which x% of the total number of cases lies
L = true lower limit of the class interval containing Px
N = total number of cases
cf = cumulative frequency of the class below the interval
containing Px
fi = frequency of the interval containing Px
i = class size
121
Measures of
Variability
122
Consider the two sets of data below.
Set A
25, 28, 28, 30, 30, 33, 35, 40, 41, 45
Set B
10, 15, 23, 28, 28, 30, 39, 45, 52, 65
123
Range
Example:
In Set A, the range is 45 – 25 = 20.
In Set B, the range is 65 – 10 = 55.
124
125
Mean Absolute Deviation (MAD)
Ungrouped Data
xi − x
MAD =
N
Where
xi = score
x = mean of the scores
N = total number of scores
126
Mean Absolute Deviation (MAD)
Grouped Data
MAD =
(
f X −X )
N
Where
X = class mark
X = mean
f = frequency
N = total number of cases
127
Population Variance
Ungrouped Data
2 = i =1
N
128
Sample Variance
Given a random sample x1, x2, …,
xn, the sample variance is
n
( )
2
xi − x
2 i =1
Biased estimator: s =
n
n
( )
2
xi − x
i =1
Unbiased estimator: s2 =
n −1
129
Sample Standard Deviation
( )
n 2
xi − x
s= i =1
n −1
130
Computational Formula for the Sample
Variance (unbiased)
( )
2 2
2
n x − x
s =
n(n −1)
131
Example:
Set A
25, 28, 28, 30, 30, 33, 35, 40, 41, 45
We have x = 33.5 , n = 10
10
(x i − 33.5)
2
s =
2 i =1
=
10 − 1 9
s 2 = 43
s =7
132
Example:
Set B
10, 15, 23, 28, 28, 30, 39, 45, 52, 65
We have x = 33.5 , n = 10
10
(x i − 33.5)
2
s = =
2 i =1
10 − 1 9
s 2 = 287 and s= 17
133
Sample Variance from Grouped Data
n
( )(xi − x )
2
f
2 i =1
s =
n
where
f = frequency
x = class mark
x = mean
n = total number of observations
134
Computational Formula for the Sample
Variance from Grouped Data
( )
2 2
2
n f x − f x
s =
n2
where
f = frequency
x = class mark
n = total number of observations
135
Measure of Relative Variation
To compare the variability of data sets
measured in different units, we use the
measure of relative variation called
coefficient of variation. This index
expresses the standard deviation as a
percentage relative to the mean. It’s value
is given by
s
C .V . = 100%
x
136
Example:
Determine which data set is more
spread out.
137
We first compute the means and standard
deviations of the sets of data.
Data Set 1:
x = 24 years and s = 3.742 years
Data Set 2 :
x = P 8875 and s = P 2267.984
138
So, we have
Data Set 1:
s 3.742 years
C .V . = 100% = 100% = 15.59%
x 24 years
Data Set 2 :
s P 2267.984
C.V . = 100% = 100% = 25.55%
x P 8875
140
Normal Distribution
141
For small values of , the distribution
tends to be leptokurtic, while for large values
of , the distribution tends to be platykurtic.
leptokurtic platykurtic
142
When the distribution of the set of data
is symmetric, the three measures of central
tendency have the same values.
Symmetric Distribution
143
For skewed distribution, the three
measures of central tendency will have different
values. When the distribution is negatively
skewed, majority of the scores have high
values and there will only be few extremely low
scores.
144
For positively skewed distribution,
majority of the scores have low values and
there will only be few extremely high scores.
145
Properties of the Theoretical Normal
Distribution
147
Areas Under the Standard Normal Distribution
Curve
-3 -2 -1 0 +1 +2 +3
148
Standard Score or z Score
value − mean
z=
standard deviation
X −
z=
149
Finding Areas Under the Normal
Distribution Curve
Illustration 1
Find the area to the left of z=-1.93.
0.5000
- 0.4732
0.0268
-1.93 0
0.5000
- 0.3665
0.1335
1.11
151
Illustration 3
Find the area under the normal curve
between z=0 and z=2.34.
0.4904
0 2.34
152
Illustration 4
Find the area under the normal curve
between z=0 and z=-1.75.
0.4599
-1.75 0
153
Illustration 5
Find the area between z=2.0 and
z=2.47.
0 2.0 2.47
154
Illustration 6
Find the area between z=-1.37 and z=1.68.
-1.37 0 1.68
155
Illustration 7
Find the area to the right of z=-1.37.
-1.37 0
156
Applications of the Normal Distribution
Illustration
Given that the scores in a test are
normally distributed with a mean of 50 and
standard deviation of 8.
-3 -2 -1 0 +1 +2 +3
26 34 42 50 58 66 74
157
Problem 1
If the scores after the test have a mean
of 100 and a standard deviation of 15, find the
percentage of scores that will fall below 112.
Solution
x −
z=
112 − 100
z=
15
z = 0.8 100 112
0 0.8
100 X
0 z?
159
Solving for X, given z=1.28, =100, and =20
x −
z=
X − 100
1.28 =
20
X = (1.28)(20) + 100
X = 25.6 + 100
X = 125.6
161
50%
0.2500 0.2500
X1 100 X2
z1 0 z2
162
Then, we solve for the two limits (upper & lower)
X1 − X2 −
z1 = z2 =
X 1 − 100 X 2 − 100
−0.67 = 0.67 =
15 15
X 1 = (−0.67)(15) + 100 X 2 = (0.67)(15) + 100
X 1 = 89.95 X 2 = 110.05
163
HYPOTHESIS
TESTING
164
Statistical hypothesis
A statistical hypothesis is a
conjecture about a population parameter.
This conjecture may or may not be true.
165
Types of Hypothesis
166
The alternative hypothesis can be
directional or nondirectional.
167
Example 1
H0 : = 73
H1 : 73
168
Example 2
H0 : ≤ 36
H1 : > 36
169
Example 3
H0 : ≥ 18
H1 : < 18
170
A statistical test uses the data
obtained from a sample to make a
decision about whether or not the null
hypothesis should be rejected.
171
Types of Error in Decision Making
H0 TRUE HO FALSE
ERROR Correct
Reject H0
Type I Decision
Correct ERROR
Do not reject H0
Decision Type II
174
The critical or rejection region is the
range of values of the test value that
indicates that there is a significant
difference and that the null hypothesis
should be rejected.
The noncritical or nonrejection region is
the range of values of the test value
that indicates that the difference was
probably due to chance and that the
null hypothesis should not be rejected.
175
One-Tailed vs Two-Tailed Test
176
Steps in Hypothesis Testing
177
LARGE SAMPLE MEAN
TEST
178
z – Test
The z – test is a statistical test for
the mean of a population. It can be used
when n 30, or when the population is
normally distributed and is known.
x −
z=
/ n
where : x = sample mean
= hypothesized population mean
= population deviation
n = sample size
179
Example 1
180
Step 1: State the hypotheses and identify the
claim.
H0 : ≤ 483 H1 : > 483 (claim)
181
Step 4: Make a decision.
Do not reject the null hypothesis since
the test value, 0.62, falls in the noncritical
region.
182
Example 2
The average serum cholesterol level
in a certain group of patients is 240
milligrams. The standard deviation is 18
milligrams. A new medication is designed
to lower the cholesterol level if taken for 1
month. A sample of 40 people used the
medication for 30 days, after which their
average cholesterol level was 229
milligrams. At α=0.01, does the medication
lower the cholesterol level of the patients?
183
Step 1: The hypotheses are
H0 : μ = 240 H1 : μ < 240 (claim)
184
Step 4: Since the test value (-3.87) falls in the
critical region, the decision is to reject
the null hypothesis.
185
Example 3
A manufacturer states that the
average lifetime of its television sets is 84
months. The standard deviation of the
population is 10 months. One hundred
sets are randomly selected and tested.
The average lifetime of the sample is 85.1
months. At =0.01, is there enough
evidence to reject the manufacturer’s
claim?
186
Step 1: The hypotheses are
H0 : μ = 84 (claim) H1 : μ 84
x − 85 .1 − 84 1 .1
z= = =
n 10 100 10 10
1 .1
z= = 1 .1
1
187
Step 4: Since the test value, 1.1, falls in the
noncritical region, the decision is not to
reject the null hypothesis.
188
P - Value
✓ The P-value is the actual probability of
getting the sample mean value or a
more extreme sample mean value in
the direction of the alternative
hypothesis if the null hypothesis is true.
✓ The P-value is the actual area under
the distribution curve representing the
probability of a particular sample mean
or a more extreme sample mean
occurring if the null hypothesis is true.
189
✓ For example, suppose the null hypothesis
is H0: = 50 and the mean of the sample
is x = 52. if the computer printed a
P-value of 0.0356 for a statistical test,
then the probability of getting a sample
mean of 52 or greater is 0.0356 if the true
population mean is 50.
✓ What is the relationship between the
P-value and the value?
190
Area = 0.05
Area = 0.0356
Area = 0.01
50 52
191
SMALL SAMPLE MEAN
TEST
192
t Distribution
193
The t distribution differs from the standard
normal distribution in the following ways:
194
0
195
t – Test
The t – test is a statistical test for the
mean of a population and is used when the
population is normally or approximately
normally distributed, is unknown, and
n<30.
x−
t=
s/ n
196
Example 1: Find the critical t value for
=0.05 with d.f.=16 for a right-tailed test.
df 0.10 0.05 0.025 0.01 0.005
1
2
3
:
16 1.746
:
197
Example 2: Find the critical t value for
=0.01 with d.f.=22 for a left-tailed test.
df 0.10 0.05 0.025 0.01 0.005
1
2
3
:
22 2.508
:
df 0.10 0.05 0.025 0.01 0.005
1
2
3
:
18 1.734
:
200
Formula for the z Test for Proportions
X− X - np
z= or z =
npq
201
Example 1
A recent study claimed that less
than 15% of all eighth-grade students are
overweight. In a sample of 80 students, 9
were found to be overweight. At =0.05, is
there enough evidence to support the
claim?
202
Step 1: The hypotheses are
H0 : p = 0.15 H1 : p < 0.15 (claim)
203
Step 4: Compute the test value.
X − 9 − 12 −3
z= = = = −0.94
3.19 3.19
204
Example 2
Experts claim that 10% of murders
are committed by women. Is there enough
evidence to support the claim if in a
sample of 75 murders, 12% were
committed by women? Use =0.05.
205
Step 1: The hypotheses are
H0 : p = 0.10 (claim) H1 : p 0.10
206
Step 4: Compute the test value.
Note: X = (75)(0.12) = 9
X − 9 − 7.5 1.5
z= = = = 0.58
2.60 2.60
Step 5: Do not reject the null hypothesis
since the test value, 0.58, falls in the
noncritical region.
Step 6: There is enough evidence to support
the claim that 10% of murders are committed
by women.
207
Tests for proportions can also be
conducted by using an equivalent formula
below.
p̂ − p
z=
pq n
208
VARIANCE OR STANDARD
DEVIATION TEST
209
Chi-square Distribution
210
d.f.=1
d.f.=4
d.f.=9
d.f.=15
2
211
Formula for the Chi-Square Test for a
Single Variance
2 =
(n − 1)s 2
2
212
Assumptions for the Chi-Square Test
for a Single Variance
213
Example 1
An instructor wishes to see whether
the variation in scores of 23 students in
her class is less than the variance of the
population. The variance of the class is
198. Is there enough evidence to support
the claim that the variation of the students
is less than the population variance
(2=225) at =0.05? Assume that the
scores are normally distributed.
214
Step 1: The hypotheses are
H0 : 2 = 225 H1 : 2 < 225 (claim)
215
Step 3: Compute the test value.
=
2 (n − 1)s 2
=
(23 − 1)(198 )
= 19 .36
2 225
216
Notes for the Use of Chi-Square
217
Types of Samples
Independent Samples – These are
samples that are randomly selected from
distinct populations. The sample sizes
may or may not be equal.
218
Examples of independent samples
219
Dependent Samples – These samples
usually arise in experimental designs
where the objective is to make sure that
the subjects being compared are
comparable in terms of relevant variables.
– These experimental designs are
repeated measures designs (e.g. pretest-
posttest design) and matched groups
design.
– The sample sizes of the groups are
always equal.
220
Pretest – Posttest Design
221
Matched Groups Design
222
TESTING THE DIFFERENCE
BETWEEN TWO MEANS:
Large Independent Samples
223
Assumptions for the Test to Determine the
Difference between Two Means
224
Formula for the z Test for Comparing Two
Means from Independent Samples
z=
(x 1 )
− x 2 − (1 − 2 )
12 22
+
n1 n2
225
If 12 and 22 are not known, the
formula below can be used provided that
n1 30 and n2 30
z=
(x 1 )
− x 2 − (1 − 2 )
s12 s22
+
n1 n2
226
Example 1
Psychology Mathematics
x1 = 118 x 2 = 115
1 = 15 2 =15
n1 = 50 n2 = 50
227
Step 1: The hypotheses are
H0 : 1 = 2 H1 : 1 > 2 (claim)
z=
(x 1 )
− x 2 − (1 − 2 )
=
(118 − 115 ) − 0
12 22 15 2 15 2
+ +
n1 n2 50 50
3−0 3 3
z= = = =1
225 225 9 3
+
50 50
228
Step 4: Do not reject the null hypothesis
since the test value, 1.0, falls in the
noncritical region.
229
Example 2
In a study of women science majors, the
following data on a self-esteem questionnaire
were obtained on two groups, those who left their
profession within a few months after graduation
(leavers) and those who remained on their
profession (stayers). At =0.05, can it be
concluded that there is no difference in the self-
esteem scores of the two groups?
Leavers Stayers
x1 = 3.05 x 2 = 2.96
s1 = 0.75 s2 =0.75
n1 = 103 n2 = 225
230
Step 1: The hypotheses are
H0 : 1 = 2 (claim) H1 : 1 2
231
Step 3: Compute the test value.
z=
(x 1 )
− x 2 − (1 − 2 )
=
(3.05 − 2.96 ) − 0
s12 s22
+
(0.75 ) + (0.75 )
2 2
n1 n2 103 225
0.09 − 0 0.09
= =
0.5625 0.5625 0.0055 + 0.0025
+
103 225
0.09 0.09
= =
0.008 0.089
z = 1.01
232
Step 4: Do not reject the null hypothesis
since the test value, 1.01, falls in the
noncritical region.
233
TESTING THE DIFFERENCE
BETWEEN VARIANCES
234
Characteristics of the F Distribution
235
Assumptions for Testing the Difference
between Two Variances
236
Formula for the F Test
s12
F= 2
s2
237
Example 1: Find the critical value for a right –
tailed F test when = 0.01, the degrees of
freedom for the numerator (d.f.N.) are 10, and
the degrees of freedom for the denominator
(d.f.D.) are 18.
= 0.01
d.f.D. d.f.N.
1 2 … 10 …
1
2
:
18 3.51
:
238
Example 2: Find the critical value for a two –
tailed F test when = 0.05, d..f.N..=17 and
d.f.D.=24
= 0.025
d.f.D. d.f.N.
1 2 … 15 …
1
2
:
24 2.44
:
239
Example 3
240
Step 1: The hypotheses are
H0 : 12 = 22 (claim) H1 : 12 22
s12 192
F= 2 = = 1.02
s2 189
241
Step 4: Do not reject the null hypothesis
since the test value, 1.02, is less than
the critical value, 2.53.
242
Notes for the Use of the F Test
✓ The larger variance should always be placed in
the numerator of the formula.
✓ For a two-tailed test, the value must be
divided by 2 and the critical value be placed on
the right side of the F curve.
✓ If the standard deviations are given in the
problem, they must be squared for the formula
for the F test.
✓ When the degrees of freedom cannot be found
in the table, the closest value on the smaller
side should be used.
243
TESTING THE DIFFERENCE
BETWEEN TWO MEANS:
Small Independent Samples
244
Formulas for t–Tests
Difference between Two Means – Small
Independent Samples
t=
(X 1 )
− X2 − (1 − 2 )
s12 s22
+
n1 n2
245
Formulas for t–Tests
Difference between Two Means – Small
Independent Samples
t=
(X1 )
− X 2 − (1 − 2 )
(n1 − 1)s12 + (n2 − 1)s22 1 1
+
n1 + n2 − 2 n1 n2
246
Example 1
A researcher suggests that male
nurses earn more than female nurses. A
survey of 16 male nurses and 20 female
nurses reports the following data. Is there
enough evidence to support the claim that
male nurses earn more than female nurses?
Use = 0.05.
Male Female
x1 = S23,800 x 2 = S23,750
s1 = $300 s2 =$250
n1 = 16 n2 = 20
247
Solution
The F test will be used to determine
whether or not the variances are equal. The null
hypothesis is that the variances are equal.
The critical value obtained from the
table for =0.05 is 2.23, using d.f.N.=15 and
d.f.D.=19.
The test value is
F=
s
=
2
1(300 )
=
2
90000
= 1.44
s 2
2(250 ) 62500
2
248
Since 1.44 < 2.23, the decision is do
not reject the null hypothesis and conclude
that the variances are equal. Hence, the
second formula will be used.
249
Step 1: The hypotheses are
H0 : 1 = 2 H1 : 1 > 2 (claim)
250
Step 3: Compute the test value.
t=
(X1 )
− X 2 − (1 − 2 )
(n1 − 1)s12 + (n2 − 1)s22 1 1
+
n1 + n2 − 2 n1 n2
t=
(23,800 − 23,750 ) − 0
(16 − 1)(300 )2 + (20 − 1)(250 )2 1 1
+
16 + 20 − 2 16 20
50
t=
(15 )(90000 ) + (19 )(62500 )
(0.0625 + 0.05 )
34
251
Step 3: Compute the test value.
50
t=
1,350,000 + 1,187,500
(0.1125 )
34
50
t=
2,537,500
(0.1125 )
34
50 50
t= =
(74,632 .35 )(0.1125 ) 8,396 .14
50
t= = 0.55
91 .63
252
Step 4: Do not reject the null hypothesis
since 0.55 < 1.645.
253
TESTING THE DIFFERENCE
BETWEEN TWO MEANS:
Small Dependent Samples
254
Formulas for t–Test
Difference between Two Means
Small Dependent Samples
D − D
t=
sD
n
with d.f.= n – 1 and where
( D)
2
D=
D D − n
2
and sD =
n n −1
255
Formulas for t–Test
Difference between Two Means
Small Dependent Samples
D − D
t=
sD
n
256
Example 1
A dietician wishes to see if a person’s
cholesterol level will change if the diet is
supplemented by a certain mineral. Six students
were pretested and then took the mineral
supplement for a six-week period. The results are
shown below. (Cholesterol level is measured in
milligrams/deciliter). Can it be concluded that the
cholesterol level has been changed at =0.10?
Assume that the variable is approx. normally
distributed.
Subject 1 2 3 4 5 6
Before (X1) 210 235 208 190 172 244
After (X2) 190 170 210 188 173 228
257
TESTING THE DIFFERENCE
BETWEEN PROPORTIONS
258
Formula for z Test for Comparing
Proportions
z=
(p̂1 − p̂ 2 ) − (p1 − p 2 )
1 1
p q +
n1 n2
where
X1 + X 2
p= ; q = 1- p
n1 + n2
X1 X2
p̂1 = ; p̂ 2 =
n1 n2
259
Example 1
In a sample of 200 surgeons, 15%
thought the government should control
health care. In a sample of 230 general
practitioners, 20% felt this way. At =0.10,
is there a difference between the
proportions?
260