0% found this document useful (0 votes)
10 views

Statistics

Here are some key factors to consider when determining sample size: - Population size - Larger populations require smaller sample sizes. - Desired level of precision - More precise estimates require larger samples. - Confidence level - Higher confidence levels require larger samples. - Expected variation in the attribute being measured - More variation requires larger samples. - Resources and time available - More resources allow for larger samples. In general, larger samples are preferred as they provide more accurate representations of the population. But practical constraints often require balancing statistical accuracy with feasibility. Common guidelines suggest a minimum sample size of 30 for most research.

Uploaded by

Dawn Versatility
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Statistics

Here are some key factors to consider when determining sample size: - Population size - Larger populations require smaller sample sizes. - Desired level of precision - More precise estimates require larger samples. - Confidence level - Higher confidence levels require larger samples. - Expected variation in the attribute being measured - More variation requires larger samples. - Resources and time available - More resources allow for larger samples. In general, larger samples are preferred as they provide more accurate representations of the population. But practical constraints often require balancing statistical accuracy with feasibility. Common guidelines suggest a minimum sample size of 30 for most research.

Uploaded by

Dawn Versatility
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 260

1

120, 118, 123, 124, 138, 137, 130, 119, 120,


125, 118, 118, 123, 124, 132

125, 135, 119, 115, 120, 140, 123, 125


119, 132, 130, 130, 130, 131, 132

2
How can we make these numbers
meaningful?

3
RAW DATA

▪ these are data which are not yet sorted or


arranged according to some criteria or some
systematic considerations

▪ data that are collected in original form

4
5
Task I. Measuring Arm Span

120, 118, 123, 124, 138, 137, 130, 119, 120,


125, 118, 118, 123, 124, 132

125, 135, 119, 115, 120, 140, 123, 125


119, 132, 130, 130, 130, 131, 132

6
• What do these numbers represent?
• Can we get clear and precise information
immediately as we look at these numbers?
Why?
• How can we make these numbers
meaningful for anyone who does not know
about the description of these numbers?

7
RAW DATA

▪ these are data which are not yet sorted or


arranged according to some criteria or some
systematic considerations

▪ data that are collected in original form

8
PROCESSING

Statistics plays a vital role in every


fields of human activity. In our daily
activities, we encounter a lot of sorting
and organizing objects, data, or things like
what you just did. These are just few of the
activities involved in the study of
Statistics.

9
➢ Give some examples of activities which you
think Statistics is involved.
• What is Statistics?

➢ List down some problems or questions that can


be answered using Statistics.

10
What is Statistics?

➢ Statistics is the science of conducting studies to


collect, organize, summarize, analyze, and draw
conclusions from data. (Bluman, 2008)
➢ Statistics is a group of methods used to collect,
analyze, present, and interpret data and to make
decisions. (Mann, 2004)
➢ Statistics is the science of data. This involves
collecting, classifying, summarizing, organizing,
analyzing, and interpreting data. (Sincich, 1991)

11
Functions or Uses of Statistics

Statistics helps in …
➢ providing a better understanding and exact
description of a phenomenon of nature.
➢ proper and efficient planning of a statistical
inquiry in any field of study.
➢ collecting an appropriate quantitative data.
➢ presenting complex data in a suitable tabular,
diagrammatic and graphic form for an easy and
clear comprehension of the data.

12
➢ understanding the nature and pattern of
variability of a phenomenon through quantitative
observations.

➢ drawing valid inference, along with a measure of


their reliability about the population parameters
from the sample data.

13
Nature of Statistics

• Descriptive Statistics
➢ Methods concern with describing and
summarizing sets of data.
• Inferential Statistics
➢ Methods that make possible the estimation of a
characteristics of a population or the making of a
decision concerning a population based on
information provided by the sample.

14
Example

1. Of 350 randomly selected people in the town of


Luserna, Italy, 280 people had the last name
Nicolussi. An example of descriptive statistics is
the following statement :
• "80% of these people have the last name
Nicolussi."
2. On the last 3 Sundays, Henry D. Carsalesman
sold 2, 1, and 0 new cars respectively. An
example of descriptive statistics is the following
statement :
• "Henry averaged 1 new car sold for the last 3
Sundays."
15
Example

3. Of 350 randomly selected people in the town of


Luserna, Italy, 280 people had the last name
Nicolussi. An example of inferential statistics is the
following statement :
• "80% of all people living in Italy have the last
name Nicolussi.“
4. On the last 3 Sundays, Henry D. Carsalesman
sold 2, 1, and 0 new cars respectively. An
example of inferential statistics are the following
statements :
• "Henry never sells more than 2 cars on a
Sunday."
16
Variable
A characteristic or information of interest
that is observable or measurable from every
individual or object under consideration.

Types of Variable

Qualitative / Categorical
- measures a quality or characteristics
Example: Hair Color, NBA Teams, Gender, Course
Section, etc.

Quantitative / Numerical
- measures a numerical quantity or amount
- answers questions “how much” or “how many”
Example: height of a student, weight of babies, etc.

Note that in research variables are also characterized as dependent,


independent, extraneous and moderator variables. As graduate
students one should be aware of this.
17
Intervening Variable

Age Level of Work


Motivation performance

Moderator Variable
Ability
Level

Method Academic
of Performance
Teaching

18
Types of Quantitative Variables

• DISCRETE
➢ Assumes only a finite countable number of
values
• Example:
– number of meals in a day
– Number of units enrolled, etc

• CONTINUOUS
➢ Assumes infinitely many values corresponding to
the points on a line interval
• Example
– Age
– Travel time from home to school

19
Levels of Measurement of Variables

• NOMINAL
➢ Variable whose values are simply labels or
names or categories without any implicit or
explicit ordering of the labels
➢ Lowest level of measurement
• Example:
– Gender,
– ID number, etc.

• What are other examples that you can give?

20
Levels of Measurement of Variables

• ORDINAL
➢ Variables whose values are labels or classes
with an implied ordering in these labels
➢ Ranking can be done on this data
➢ Distance between two levels cannot be
quantified
• Example:
– Job Position,
– Faculty rank, etc.

• What are other examples that you can give?

21
Levels of Measurement of Variables

• INTERVAL
➢ Variables whose values can be ordered, and in
addition, may be added or subtracted, but not
divided nor multiplied
➢ Distances between any two points are of known
size, the unit of measurement is constant (but
arbitrary), and the zero point is arbitrary (not
specified)

• Example:
– Temperature

• What are other examples that you can think of?


22
Levels of Measurement of Variables

• RATIO
➢ Variable whose values have all the properties of
the interval scale, and in addition, can be
multiplied and divided
➢ Has a true zero point
➢ Highest level of measurement
• Example:
– length,
– weight,
– Height, etc.

• What are other examples that you can give?

23
Data Collection

24
Data
➢ Measurements of variables from every individual
or object under consideration.

Kinds of Data
• PRIMARY
➢ Data obtained directly from the source of
information.

• SECONDARY
➢ Data obtained have been previously
collected by another person or institution for
some other purposes, taken usually from
publications or existing records.

25
Sampling

26
Population

• In statistics, population refers to the total set of


observations that can be made.
• For example, if we are studying the weight of
adult women, the population is the set of
weights of all the women in the world.
• If we are studying the grade point average
(GPA) of students at Harvard, the population is
the set of GPA's of all the students at Harvard.

27
Sample

• In statistics, a sample refers to a subset of a


population.

• For example, suppose we wanted to know the


average height of 12-year-old boys in the
Philippines.

Sampling refers to the process of


choosing a sample of elements from a
total population of elements.

28
Why do sampling?

• The primary reason of obtaining a sample is to get


statistical information through a sample drawn from
it so that the characteristics of our sample are
reflective of that of the entire population.
• Often, it is necessary to use samples for research,
because it is impractical to study the whole
population.
• Further, a sample allows to obtain information at a:
➢ Greater speed
➢ Reduced cost
➢ Greater scope
➢ Greater accuracy

29
Determination of sample size

L.R gay (1991) suggests that


1. For Descriptive Research,
researcher may decide to take
a small percent of the population if the
population size is large or large
percentage if the population size is small
2. For Correlation Research , at least 30
samples may be used.
3. For Experimental/ Causal
Comparative Research, at least 15
samples per group may be used.
30
Determination of sample size

Fraenkle and Wallen suggest that


1. For Descriptive Research,
a minimum of 100 respondent
is essential
2. For Correlation Research , at least 50
samples is deemed appropriate to
establish the existence of relationship
3. For Experimental/ Causal
Comparative Research, at least 30
samples per group may be used.
31
Slovin’s Formula

• n = N/(1+Ne2)

➢ Where n = sample size


N = Population
e = Margin of error (allowable error)

32
• Use Slovin’s formula to find out what sample
of a population of 1,000 people you need to
take for a survey on their soda preferences.
Step 1: Figure out what you want
your confidence level to be. For example, you
might want a confidence level of 95 percent
(which will give you a margin error of 0.05), or
you might need better accuracy at the 98
percent confidence level (which produces a
margin of error of 0.02).

33
• Step 2. Plug your data into the formula. In
this example, we’ll use a 95 percent confidence
level with a population size of 1,000.
n = N / (1 + N e2) =1,000 / (1 + 1000 * 0.02 2) =
714.26
• Step 3: Round your answer to a whole
number (because you can’t sample a fraction
of a person or thing)
715

34
Sampling Procedures:
Probability vs. Non-Probability
Sampling
Probability sampling.
With probability sampling, every element of the
population has a known probability of being included in
the sample.

Advantage:

Probability samples allow us to make probability statements


about sample statistics. We can estimate the extent to
which a sample statistic is likely to differ from a
population parameter. The remainder of this tutorial
focuses on probability sampling.

35
Probability vs. Non-Probability
Sampling

Non-probability sampling.
With non-probability sampling, we cannot
specify the probability that each element will be
included in the sample.

The main advantages of non-probability


sampling are convenience and cost. However, with
non-probability samples, we cannot make
probability statements about our sample statistics.
For example, we cannot compute a confidence
interval for an estimation problem or a region of
acceptance for a hypothesis test.

36
Methods of Probability Sampling

• Simple random sampling


• Systematic random sampling
• Stratified random sampling
• Cluster sampling
• Multistage sampling

37
Simple Random Sampling

Simple random sampling refers to a sampling


method that has the following properties.
✓ The population consists of N objects.
✓ The sample consists of n objects.
✓ All possible samples of n objects are
equally likely to occur.
✓The main benefit of simple random sampling
is that it guarantees that the sample chosen
is representative of the population. This
ensures that the statistical conclusions will be
valid.
38
Simple Random Sampling

✓There are many ways to obtain a simple


random sample. One way would be the
lottery method. Each of the N population
members is assigned a unique number. The
numbers are placed in a bowl and thoroughly
mixed. Then, a blind-folded researcher
selects n numbers. Population members
having the selected numbers are included in
the sample.

39
Systematic random sampling
➢This technique of sampling involves the
selection of the desired sample in a list by
arranging them randomly. otherwise if
arranged systematically or logically in either
alphabetical arrangement or any acceptable
arrangement It becomes systematic sampling.

➢Systematic random sampling is a method of


selecting a sample by taking every kth unit
from the ordered population.

➢In the case of systematic sampling, only the


first unit being selected at random and the rest
in a systematic manner
40
The following steps shows how a
systematic random sample of 5
households be selected from the 20
housing units
• Randomly assign the housing units from 1 to 20
• Determine the sampling interval, computed as;
k = N/n
• Select at random from 1 to k.
• The next housing unit is every kth unit thereafter
• Another way selecting 5 out of 20 housing units
with a random start may be done using table of
random numbers or other random numbers
generator facility.
41
Advantages

• Easier to apply and less likely to


make mistakes

• It is possible to select a sample in


the field without sampling frame

• It could give a more precise


estimate than SRS when there is
order in the samples

42
Stratified random sampling

• refers to a sampling method that has the


following properties.
• The population consists of N elements.
• The population is divided into H groups, called
strata.
• Each element of the population can be
assigned to one, and only one, stratum.
• The researcher obtains a probability sample
from each stratum.

43
44
Stratified random sampling

• Proportional Allocation

-This procedure choose sample size


proportional to the sizes of the different
subgroups or strata

• Equal allocation
– In this procedure the sample size of each
group/ stratum determined by dividing the
n by the number of strata or subgroups.
Each group/stratum will have equal size

45
Example
• A survey was conducted to find out if families
living in a certain community are in favor of
Charter Change. To ensure that all income
groups are represented, respondents will be
divided into high income (class A), middle
(Class B) and low-income (Class C) groups.
Below is the distribution of income:

Strata Number of Families


Class A 1,000
Class B 2,500
Class C 1,500
Total (N) 5,000

46
Advantages and Disadvantages
• Stratified sampling offers several advantages over simple
random sampling.
• A stratified sample can provide greater precision than a
simple random sample of the same size.
• Because it provides greater precision, a stratified sample
often requires a smaller sample, which saves money.
• A stratified sample can guard against an
"unrepresentative" sample
• We can ensure that we obtain sufficient sample points to
support a separate analysis of any subgroup.
• The main disadvantage of a stratified sample is that it
may require more administrative effort than a simple
random sample.

47
Cluster Sampling

• This is a sampling process in which groups, not


individuals, are randomly selected. All the
members of selected groups have similar
characteristics . It is a result from a two stage
process in which the population is divided into
clusters and a subset of the clusters in which
the population is divided into clusters and a
subset of the clusters is randomly selected.
Clusters are commonly based on geographic
areas or districts

48
Steps:
• Identify and define the problem
• Determine the desired sample size
• Identify and define a logical cluster
• List all clusters (or obtain a list) that comprise the
population
• Estimate the average number of population members per
cluster
• Determine the number of clusters needed by dividing the
sample size by the estimated size of cluster
• Randomly select the needed number of clusters (using a
table of random numbers)
• Include in your study all population members in each
population cluster.

49
Let see how a superintendent would
get a sample of teachers if cluster
sampling were used.
• The population is all 5000 teachers in the superintendent’s school
system
• The desired sample size is 500
• A logical cluster is a school
• The superintendent has a list of all schools in the district, there are
100 schools
• Although schools vary in the number of teachers per school, there
is an average of 50 teachers per school
• The number of clusters (schools) needed equals the desired
sample size, 500, divided by the average size of the cluster, 50.
Thus the number of schools needed is 500/50=10
• Therefore, 10 of the 100 schools are randomly selected
• All the teachers in each of the 10 schools are in the sample (10
schools, 50 teachers per school, equals the desired sample size
which is 500.

50
Methods of Non-probability Sampling
• Accidental or Incidental Sampling
➢ Based exclusively on what is convenient for the
researcher, i.e. the researcher includes the most
convenient cases in his/her sample and excludes the
inconvenient cases. There are several techniques that
may be characterized under this, e.g. snowball
sampling.
• Quota Sampling
➢ Example:
• Suppose we are asked to draw a quota sample from the
students attending a university, where 42% are females
and 58% were males. (in this method the researcher is
given a sample with respect to locale, so that 42% of
the samples consist of female and 58% of males. So
that, if the total sample is 200, then we take 84 female
students and 116 male students.
The inadequacy of quota sampling is anchored that lack of control
over factors other that those set in the quota.
51
Source: Levin and Fox, 1997
Methods of Non-probability Sampling

• Judgment or Purposive sampling


➢ In this type of sampling, logic, common sense, or
sound judgment can be used to select a sample that
is representative of the larger population.

Example:
➢ Predicting outcome of the Tacloban City Mayoralty
Election
• A particular Baranggay who traditionally voted for
winning candidates for the mayoralty office be
considered as the sample.

52
Source: Levin and Fox, 1997
Frequency
Distribution

53
Frequency Distribution

• A frequency distribution is the


organization of raw data in table form,
using classes and frequencies.
• There are three basic types of frequency
distributions. The three types are
categorical, ungrouped and grouped
frequency distributions.

54
Categorical Frequency
Distribution
• The categorical frequency distribution is
used for data that can be placed in
specific categories, such as nominal- or
ordinal-level data.
• For example, data such as political
affiliation, religious affiliation, or major
field of study.

55
Example

• Twenty-five army inductees were given a


blood test to determine their blood type.
The data set is as follows:

A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

56
The frequency distribution is

Blood Type Frequency Percent


A 5 20
B 7 28
O 9 36
AB 4 16
N = 25 100

57
Ungrouped Frequency
Distribution

• An ungrouped frequency distribution is


used for numerical data and when the
range (the difference between the highest and
the smallest values) is small.

58
Example

59
60
Grouped Frequency Distribution
• When the range of the data is large, the
data must be grouped into classes that are
more than one unit in width.
• To construct a frequency distribution, follow
these rules:
1. There should be between 5 and 20
classes.
2. The class width should be an odd
number. This ensures that the midpoint of
each class has the same place value as the
data.
61
3. The classes must be mutually exclusive.
Mutually exclusive classes have
nonoverlapping class limits so that data cannot
be placed into two classes.

4. The classes must be continuous. There


should be no gaps in a frequency distribution.

5. The classes must be exhaustive. There


should be enough classes to accommodate all
the data.

6. The classes must be equal in width. This


avoids a distorted view of the data.

62
Procedure for constructing a
grouped frequency distribution
1. Find the range.
range = highest value – lowest value
2. Decide on the number of class
intervals or classes, we denote it by k.
• Sturge’s Formula: k = 1 + log2N
• another formula: k = n
• 5 – 20 classes

63
3. Determine the class size or class
width of the interval, we denote it by c.
range
c=
k
(rounded to the nearest odd whole number)

4. Determine the lower limit LL and the


upper limit UL of the lowest class
interval. The lowest class interval should
contain the lowest value in the data set. The
value of the UL is determined using the
equation

UL = LL + (c – 1)

64
5. Determine the upper class intervals by
consecutively adding the class size c to
the values of LL and UL of the lowest
class interval until we get the class
interval with the highest value in the data
set.
6. Tally the data, find the frequencies.
Note: Other statistical information may be
reflected in the table such as class boundaries,
class marks or class midpoints, less than
cumulative frquency (<cf), greater than
cumulative frequency (>cf), and the relative
frequency (rf)

65
• The class boundaries are used to
separate the classes so that there are no
gaps in the frequency distribution.

66
• The class midpoint is found by adding the
upper and lower boundaries (or limits) and
dividing by 2.

• The cumulative frequencies are used to


determine the number of cases falling below
(for <cf) or above (for >cf) a particular value in
a distribution.

67
Example
• Distribution of scores of forty students in
a Mathematics class.

68
1. Find the range = 99-67 = 32
2. no. of class interval k = 𝑛 = 40= 6.32
take 7
32
3. Class size c = = 4.57 , take 5
7

69
Class Class Class Mark Frequency
interval Boundaries (midpoint)

94.5-99.5 97 6
95-99
89.5-94.5 92 0
90-94
84.5-89.5 87 4
85-89
79.5-84.5 82 5
80-84
74.5-79.5 77 1
75-79
69.5-74.5 72
70-74
64.5-69.5 67
65-69
70
Why construct frequency
distribution?

71
Graphs

72
Graphical Presentations of Data
• The three most common statistical
graphs are the bar graph (histogram),
the frequency polygon, and the
cumulative frequency or the ogive.
• The purpose of graphs in statistics is to
convey the data to the viewer in pictorial
form.
• Graphs are useful in getting the
audience’s attention in a publication or a
presentation.

73
Histogram

• The histogram is a graph that displays


the data by using vertical bars of various
heights to represent the frequencies.

74
Frequency Polygon
• The frequency polygon is a graph that
displays the data by using lines that
connect points plotted for the
frequencies at the midpoints of the
classes.

75
Ogive
• The ogive is the graph that represents
the cumulative frequencies for the
classes in a frequency distribution.

76
Other Types of Graphs

• Pareto Chart
A Pareto chart is used to represent
a frequency distribution for categorical
variable, and the frequencies are
displayed by the heights of vertical bars,
which are arranged in order from highest
to lowest.

77
Example of a Pareto chart

78
• Pie Chart
A pie chart is a circle that is divided
into sections according to the
percentage of frequencies in each
category of the distribution.

79
Stem-and-Leaf Plot

• A stem-and-leaf plot is a data plot that


uses part of a data value as a stem and
part of the data value as the leaf to form
groups or classes.
• It has the advantage over grouped
frequency distribution of retaining the
actual data while showing them in
graphic form.

80
Example

81
Measures of
Central Tendency

82
MEASURE OF CENTRAL TENDENCY

A measure of central tendency or


measure of central location describes
the “center” of a given set of data. This is a
value about which observations tend to
cluster.

Common measures of central


tendency are the MEAN, MEDIAN, and
MODE.

83
ARITHMETIC MEAN

The arithmetic mean or simply the


mean is the average of a given set of
data. It is obtained by dividing the sum of
all the observations by the total number of
observations.

84
POPULATION MEAN

Given the population data, x1, x2, …,


xN, the population mean is given by

N
 xi
= i =1
N

 85
SAMPLE MEAN

Given the sample data, x1, x2, …, xn,


the sample mean is given by

n
 xi
x= i =1
n

 86
Example

A random sample of 5 BSED students


about to take their final examination were
asked how many hours they slept the night
before the test. The data given are 5, 7, 3,
4, and 6. The mean number of hours of
sleep is
n
 xi
5+ 7 + 3+ 4 + 6
x= =
i =1
= 5 hours
n 5

87
Remark: The mean takes into account all
observations in the data set. Thus, it is
affected by extreme values.

Example: Using the previous data, if the


student reported 14 hours of sleep instead
of 3 hours, then the new mean is

n
 xi
5 + 7 + 14 + 4 + 6
x= =
i =1
= 7.2 hours
n 5

 88
WEIGHTED MEAN

Suppose that a test is given to three


groups with the following results:

x1 = 60 n1 = 10
x2 = 50 n2 = 60
x3 = 40 n3 = 30


89
We wish to find the mean of the three
groups combined, denoted by x t

x1  n1 + x 2  n2 + x 3  n3
xt =
n1 + n2 + n3

60 10 + 50 60 + 40 30
xt =
10 + 60 + 30
4800
xt =
100
x t = 48


90
MEAN OF GROUPED DATA

k
 f i xi
x = i =1k
 fi
i =1

where
fi = frequency of the class interval

xi = class mark of the class interval

91
Example: Given the frequency distribution
table below, find its mean.

Class Interval Frequency

19 – 21 3
16 – 18 10
13 – 15 4
10 – 12 12
7–9 6

92
We solve first the class mark and the
product of the class mark and the
frequency.

Class Frequency Class Mark


Interval (f) (x) fx
19 – 21 3 20 60
16 – 18 10 17 170
13 – 15 4 14 56
10 – 12 12 11 132
7–9 6 8 48
f=35 fx=466

93
Class Frequency Class Mark
Interval (f) (x) fx

19 – 21 3 20 60

16 – 18 10 17 170

13 – 15 4 14 56

10 – 12 12 11 132

7–9 6 8 48

f=35 fx=466

k
 f i xi
466
x= i =1
= = 13.3
k
35
 fi
i =1

94
MEDIAN

The median of a set of observations


arranged in an increasing or decreasing
order of magnitude is the middle value
when the number of observations is odd or
the arithmetic mean of the two middle
values when the number of observations is
even.

95
POPULATION MEDIAN
Given the population data arranged
from the lowest to the highest, x1, x2, …,
xN, the population median is given by

Md = x N +1 if N is odd
2

xN +xN
+1
Md = 2 2
if N is even
 2

96
SAMPLE MEDIAN
Given the sample data arranged from
the lowest to the highest, x1, x2, …, xn, the
sample median is given by

Md = x n +1 if n is odd
2

xn + xn
+1
Md = 2 2
if n is even
 2

97
Steps in finding the median

1. Arrange the set of scores in ascending


order (from lowest to highest)

2. If n is odd, there will be a middle score.


This middle score is the median.
If n is even, there will be two middle
scores. The median is taken as the
arithmetic average of the two
middle scores.

98
Example: Below are the scores of 6
students in their Mathematics test. Find
the median.

35 20 12 30 25 50

Arranging the scores in increasing


order, we have

12 20 25 30 35 50

99
Since n=6, the median is the average
of the

 n   6  rd n  6 
  =  =3 and  + 1 =  + 1 = 4
th

 2  2 2  2 

observations. That is,


 
x 3 + x 4 25 + 30
Md = = = 27.5
2 2

 100
Example: Below are the scores of 7
students in their Mathematics test. Find
the median.

35 20 12 30 25 50 26

Arranging the scores in increasing


order, we have

12 20 25 26 30 35 50

101
Since n=7, the median is the

 n + 1  7 + 1
 = =4
th

 2   2 

observation. That is,



Md = x4 = 26

102
Remarks:

Similar to the mean, the median is


UNIQUE. However, unlike the mean, the
median is not influenced by extreme
values.

103
MEDIAN OF GROUPED DATA

N
− cf
Md = L + 2 i
fi
Where
Md = median
L = true lower limit of the median class (interval
 containing the score where 50% of the total observations fall)
N = total number of observations
cf = cumulative frequency of the class below the median
class
fi = frequency of the median class
i = class size

104
Example: Given the frequency distribution
below, find its median.

Class
Frequency
Interval
19 – 21 3
16 – 18 10
13 – 15 4
10 – 12 12
7–9 6

105
We identify first the class where half or
less than half of the observations fall.

Class Frequency
<cf
Interval (f)
19 – 21 3 35
16 – 18 10 32
13 – 15 4 22
10 – 12 12 18
7–9 6 6
f=35

106
N
− cf
Md = L + 2
fi
 i ()
35
−6
Md = 9.5 + 2
12
()
 3
17.5 − 6
Md = 9.5 +
12
()
 3
11.5
Md = 9.5 + ()
12
 3

Md ( )( )
= 9.5 + 0.96 3
Md = 9.5 + 2.88
Md = 12.38 = 12.4

107
MODE

The mode of the set of observations


is that value which occurs most often or
with the greatest frequency.

Example: Given the following


observations:
13 14 14 16 17 17 17 18 19

The mode of the set of data above is 17.

108
MODE OF GROUPED DATA

When data are grouped, the mode is


the class mark or midpoint of the interval
containing the greatest frequency (crude
mode).

109
Example:

Class Interval Frequency


19 – 21 3
16 – 18 10
13 – 15 4
10 – 12 12
7–9 6

The mode of the grouped data above falls in the


interval 10–12 with frequency 12, thus

10 + 12
Mo = = 11
2

110
“EXACT MODE” OF GROUPED DATA

 d 
Mo = L +  1
 i
 d1 + d 2 

Where
Mo = mode
 L = true lower limit of the modal class (interval
containing the highest frequency)
N = total number of observations
d1 = absolute difference between the modal class and
the class below it
d2 = absolute difference between the modal class and
the class above it
i = class size

111
Example:
Class Interval Frequency
19 – 21 3
16 – 18 10
13 – 15 4
10 – 12 12
7–9 6

 d   6 
Mo = L +  1
  i = 9.5+   3
 d1 + d 2  6+8
= 9.5+1.29 =10.79

 112
EMPIRICAL RULE

Mo = 3 Median− 2 Mean

113
Remarks:
➢the mode is easily and readily obtained
for a person who wants a quick measure of
central tendency,
➢unlike the mean and median, the mode
does not always exist,
➢ it is the least reliable of the three
measures of central location

114
Measures of
Relative Position

115
Percentile

Percentiles are values that divide a


set of observations into 100 equal parts.
These values, denoted by P1, P2, …, P99,
are such that 1% of the data falls below
P1, 2% falls below P2, …, and 99% falls
below P99.

116
Example: Below are the scores of 40
students in their Mathematics test. Find P85.

16 26 31 32 34 37 39 43

19 29 31 33 34 37 39 44

22 30 31 33 35 37 41 45

25 30 32 33 35 38 41 47

26 31 32 34 36 38 42 47

117
We seek the value below which
85
 40 = 34 observations fall
100

As seen from the table, P85 could be


 any value between 41 and 42. To have a
unique value, we define

41+ 42
P85 = = 41.5
2

 118
Decile

Deciles are values that divide a set


of observations into 10 equal parts.
These values, denoted by D1, D2, …, D9,
are such that 10%of the data falls below
D1, 20% falls below D2, …, and 90% falls
below D9.

119
Quartile

Quartiles are values that divide a set


of observations into 4 equal parts. These
values, denoted by Q1, Q2, Q3 and Q4, are
such that 25%of the data falls below Q1,
50% falls below Q2, and 75% falls below
Q3.

120
COMPUTATION OF QUANTILES FOR
GROUPED DATA

x% N − cf
Px = L + i
fi

Where
Px = the value which x% of the total number of cases lies

L = true lower limit of the class interval containing Px
N = total number of cases
cf = cumulative frequency of the class below the interval
containing Px
fi = frequency of the interval containing Px
i = class size

121
Measures of
Variability

122
Consider the two sets of data below.

Set A
25, 28, 28, 30, 30, 33, 35, 40, 41, 45
Set B
10, 15, 23, 28, 28, 30, 39, 45, 52, 65

They have the same mean (33.5) but


Set A is more homogeneous than Set B.

123
Range

The range of a set of data is the


difference between the largest and
smallest number in the set.

Example:
In Set A, the range is 45 – 25 = 20.
In Set B, the range is 65 – 10 = 55.

124
125
Mean Absolute Deviation (MAD)
Ungrouped Data

 xi − x
MAD =
N

Where

 xi = score
x = mean of the scores
N = total number of scores


126
Mean Absolute Deviation (MAD)
Grouped Data

MAD =
(
 f X −X )
N
Where

X = class mark
 X = mean
f = frequency
N = total number of cases

127
Population Variance
Ungrouped Data

Given the finite population x1, x2, …,


xN, the population variance is
N
 (xi − )
2

2 = i =1
N


128
Sample Variance
Given a random sample x1, x2, …,
xn, the sample variance is
n
( )
2
 xi − x
2 i =1
Biased estimator: s =
n
n
( )
2
 xi − x
i =1

Unbiased estimator: s2 =
n −1

129
Sample Standard Deviation

( )
n 2
 xi − x
s= i =1
n −1

130
Computational Formula for the Sample
Variance (unbiased)

( )
2 2
2
n x −  x
s =
n(n −1)

131
Example:
Set A
25, 28, 28, 30, 30, 33, 35, 40, 41, 45
We have x = 33.5 , n = 10
10
 (x i − 33.5)
2

(25 − 33.5) + ... + (45 − 33.5)


2 2

s =
2 i =1
=
 10 − 1 9

s 2 = 43
 s =7

132
Example:
Set B
10, 15, 23, 28, 28, 30, 39, 45, 52, 65

We have x = 33.5 , n = 10

10
 (x i − 33.5)
2

(10 − 33.5) + ... + (65 − 33.5)


2 2

s = =
2 i =1
10 − 1 9

s 2 = 287 and s= 17

133
Sample Variance from Grouped Data

n
( )(xi − x )
2
 f
2 i =1
s =
n
where
f = frequency
x = class mark
 x = mean
n = total number of observations


134
Computational Formula for the Sample
Variance from Grouped Data

( )
2 2
2
n f x −  f x
s =
n2
where
f = frequency
x = class mark
 n = total number of observations

135
Measure of Relative Variation
To compare the variability of data sets
measured in different units, we use the
measure of relative variation called
coefficient of variation. This index
expresses the standard deviation as a
percentage relative to the mean. It’s value
is given by
s
C .V . =  100%
x

136
Example:
Determine which data set is more
spread out.

137
We first compute the means and standard
deviations of the sets of data.

Data Set 1:
x = 24 years and s = 3.742 years

Data Set 2 :
x = P 8875 and s = P 2267.984


138
So, we have

Data Set 1:
s 3.742 years
C .V . =  100% =  100% = 15.59%
x 24 years
Data Set 2 :
s P 2267.984
 C.V . =  100% =  100% = 25.55%
x P 8875

Therefore, net take home pay is more scattered


wrt the mean than years of teaching experience of
 teachers.
139
Areas Under the
Normal Curve

140
Normal Distribution

The normal distribution is a


continuous, symmetric, bell-shaped
distribution of a variable.

Geometrically, the mean  is the


point on the x-axis that is directly below the
highest point of the normal curve. The
standard deviation  dictates the shape of
the distribution.

141
For small values of , the distribution
tends to be leptokurtic, while for large values
of  , the distribution tends to be platykurtic.

leptokurtic platykurtic

142
When the distribution of the set of data
is symmetric, the three measures of central
tendency have the same values.

Symmetric Distribution

143
For skewed distribution, the three
measures of central tendency will have different
values. When the distribution is negatively
skewed, majority of the scores have high
values and there will only be few extremely low
scores.

Negatively Skewed Distribution

144
For positively skewed distribution,
majority of the scores have low values and
there will only be few extremely high scores.

Positively Skewed Distribution

145
Properties of the Theoretical Normal
Distribution

1. The normal distribution curve is bell-


shaped.
2. The mean, median, and mode are
equal and located at the center of the
distribution.
3. The normal distribution curve is
unimodal.
4.The curve is symmetrical about the
mean.
146
5.The curve is continuous.
6.The curve never touches the x-axis.
7.The total area under the curve is equal to
1.00 or 100%.
8. The area under the curve that lies within
one standard deviation of the mean is
approx. 0.68 or 68%; within two
standard deviation, about 0.95 or 95%;
and within three standard deviation,
about 0.997 or 99.7% (empirical rule).

147
Areas Under the Standard Normal Distribution
Curve

-3 -2 -1 0 +1 +2 +3

148
Standard Score or z Score

The z score represents the number of


standard deviations a data value falls above
or below the mean. The formula is

value − mean
z=
standard deviation

X −
z=

149
Finding Areas Under the Normal
Distribution Curve
Illustration 1
Find the area to the left of z=-1.93.

0.5000
- 0.4732
0.0268

-1.93 0

The area to the left of z=-1.93 is 0.0268 or


2.68%.
150
Illustration 2
Find the area to the right of z=1.11.

0.5000
- 0.3665
0.1335

1.11

The area to the right of z=1.11 is 0.1335


or 13.35%.

151
Illustration 3
Find the area under the normal curve
between z=0 and z=2.34.

0.4904

0 2.34

The area is 0.4904 or 49.04%.

152
Illustration 4
Find the area under the normal curve
between z=0 and z=-1.75.

0.4599

-1.75 0

The area is 0.4599 or 45.99%.

153
Illustration 5
Find the area between z=2.0 and
z=2.47.

0 2.0 2.47

The area between z=2.0 and z=2.47 is


0.4932 – 0.4772 = 0.0160 or 1.60%.

154
Illustration 6
Find the area between z=-1.37 and z=1.68.

-1.37 0 1.68

The area between z=-1.37 and z=1.68 is


0.4147 + 0.4535 = 0.8682 or 86.82%.

155
Illustration 7
Find the area to the right of z=-1.37.

-1.37 0

The area to the right of z=-1.37 is


0.9147 or 91.47%.

156
Applications of the Normal Distribution
Illustration
Given that the scores in a test are
normally distributed with a mean of 50 and
standard deviation of 8.

-3 -2 -1 0 +1 +2 +3
26 34 42 50 58 66 74

157
Problem 1
If the scores after the test have a mean
of 100 and a standard deviation of 15, find the
percentage of scores that will fall below 112.
Solution

x −
z=

112 − 100
z=
15
z = 0.8 100 112
0 0.8

The area below z=0.8 is 0.7881, hence 78.81% of the


scores fall below 112.

158
Problem 2
A special enrichment program in math is to
be offered to the top 10% of students in a school
district. A standardized math achievement test given
to all students has a mean of 100 and a standard
deviation of 20. Find the cutoff score.
Solution

The z score that


corresponds to an area of
10% or 0.9000 is
0.4000 0.1000 1.28 (closest value).

100 X
0 z?

159
Solving for X, given z=1.28, =100, and =20

x −
z=

X − 100
1.28 =
20
X = (1.28)(20) + 100
X = 25.6 + 100
X = 125.6

 Thus, the cutoff score is 126.


160
Problem 3

For an educational study, a volunteer


must place in the middle 50% on a test. If
the mean for the population is 100 and
the standard deviation is 15, find the two
limits (upper and lower) for the scores
that would enable a volunteer to
participate in the study.

161
50%
0.2500 0.2500

X1 100 X2
z1 0 z2

The z scores that corresponds to an area of


0.2500 below and above the mean are
z1=-0.67 and z2=0.67.

162
Then, we solve for the two limits (upper & lower)

X1 −  X2 − 
z1 = z2 =
 
X 1 − 100 X 2 − 100
−0.67 = 0.67 =
15 15
X 1 = (−0.67)(15) + 100 X 2 = (0.67)(15) + 100
X 1 = 89.95 X 2 = 110.05

 Thus, the lower


 score is 90 and the
upper score is 110.

163
HYPOTHESIS
TESTING

164
Statistical hypothesis
A statistical hypothesis is a
conjecture about a population parameter.
This conjecture may or may not be true.

Statistical test of a hypothesis


A procedure for determining whether
a hypothesis can be rejected.

165
Types of Hypothesis

Null hypothesis (H0) - a statistical


hypothesis that states that there is no
difference between a parameter and a
specific value or that there is no difference
between two parameters.
Alternative hypothesis (H1) - a statistical
hypothesis that states a specific difference
between a parameter and a specific value
or that there is a difference between two
parameters.

166
The alternative hypothesis can be
directional or nondirectional.

i. A directional research hypothesis


specifies the direction of the difference
or direction of relationship.
ii.A non-directional research hypothesis
does not specify direction of the
difference or direction of relationship.

167
Example 1

A psychologist feels that playing soft


music during a test will change the results
of the test. In the past, the mean of the
scores was 73.

H0 :  = 73

H1 :   73

168
Example 2

A chemist invents an additive to


increase the life of an automobile battery.
If the mean lifetime of the automobile
battery is 36 months, then his hypotheses
are

H0 :  ≤ 36

H1 :  > 36

169
Example 3

An engineer hypothesizes that the


mean number of defects can be
decreased in a manufacturing process of
compact discs by using robots instead of
humans for certain tasks. The mean
number of defective discs per 1000 is 18.

H0 :  ≥ 18

H1 :  < 18

170
A statistical test uses the data
obtained from a sample to make a
decision about whether or not the null
hypothesis should be rejected.

The numerical value obtained from a


statistical test is called the test value.

171
Types of Error in Decision Making

In the Jury trial, there are two types of error


1. the person is innocent but the jury finds
the person guilty
2. the person is guilty but the jury declares
the person to be innocent

Truth is Person Truth is Person


Innocent Guilty
Jury decides Type 1 Error Correct Decision
person Guilty
Jury decides the Correct Decision Type II Error
person Innocent
172
Types of Error in Decision Making

H0 TRUE HO FALSE

ERROR Correct
Reject H0
Type I Decision

Correct ERROR
Do not reject H0
Decision Type II

A type I error occurs if one rejects the null


hypothesis when it is true.
A type II error occurs if one does not reject
the null hypothesis when it is false.
Type I and II Error 173
The level of significance is the maximum
probability of committing a type I error.
This probability is symbolized by 
(Greek letter alpha).

The critical value(s) separates the critical


region from noncritical region.

174
The critical or rejection region is the
range of values of the test value that
indicates that there is a significant
difference and that the null hypothesis
should be rejected.
The noncritical or nonrejection region is
the range of values of the test value
that indicates that the difference was
probably due to chance and that the
null hypothesis should not be rejected.

175
One-Tailed vs Two-Tailed Test

A one-tailed test indicates that the


null hypothesis should be rejected when
the test value is in the critical region on
one side of the mean. A one-tailed test is
either right-tailed or left-tailed, depending
on the direction of the inequality of the
alternative hypothesis.
In a two-tailed test, the null
hypothesis should be rejected when the
test value is in either of the two critical
regions.

176
Steps in Hypothesis Testing

1. State the hypotheses and identify the


claim.
2. Find the critical value(s).
3. Compute the test value.
4. Make the decision to reject or not
reject the null hypothesis.
5. Summarize the result.

177
LARGE SAMPLE MEAN
TEST

178
z – Test
The z – test is a statistical test for
the mean of a population. It can be used
when n  30, or when the population is
normally distributed and  is known.
x −
z=
/ n
where : x = sample mean
 = hypothesized population mean
 = population deviation
n = sample size

179
Example 1

The average SAT score in


Mathematics is 483, with a standard
deviation of 100. A special preparation
course states that it can increase scores. A
sample of 32 students completed the
course, and the average of their scores
was 494. At =0.05, does the course do
what it claims?

180
Step 1: State the hypotheses and identify the
claim.
H0 :  ≤ 483 H1 :  > 483 (claim)

Step 2: Find the critical value.


Since =0.05 and the test is a right-
tailed test, then the critical value is +1.65.
Step 3: Compute the test value.
x −  494 − 483 11
z= = =
 n 100 32 100 5.657
11
z= = 0.62
17 .677

181
Step 4: Make a decision.
Do not reject the null hypothesis since
the test value, 0.62, falls in the noncritical
region.

Step 5: Summarize the result.


There is no enough evidence to
support the claim that the special preparation
course can increase the SAT scores of
students in Mathematics.

182
Example 2
The average serum cholesterol level
in a certain group of patients is 240
milligrams. The standard deviation is 18
milligrams. A new medication is designed
to lower the cholesterol level if taken for 1
month. A sample of 40 people used the
medication for 30 days, after which their
average cholesterol level was 229
milligrams. At α=0.01, does the medication
lower the cholesterol level of the patients?

183
Step 1: The hypotheses are
H0 : μ = 240 H1 : μ < 240 (claim)

Step 2: Since α = 0.01 and the test is a left-


tailed test, the critical value is -2.33.

Step 3: Compute the test value.

x − 229 − 240 −11


z= = =
 n 18 40 18 6.325
−11
z= = −3.87
2.846

184
Step 4: Since the test value (-3.87) falls in the
critical region, the decision is to reject
the null hypothesis.

Step 5: There is enough evidence to support


the claim that the medication lowers the
cholesterol level of the patients.

185
Example 3
A manufacturer states that the
average lifetime of its television sets is 84
months. The standard deviation of the
population is 10 months. One hundred
sets are randomly selected and tested.
The average lifetime of the sample is 85.1
months. At =0.01, is there enough
evidence to reject the manufacturer’s
claim?

186
Step 1: The hypotheses are
H0 : μ = 84 (claim) H1 : μ  84

Step 2: Since α = 0.01 and the test is a two-


tailed test, the critical values are +2.575
and -2.575.
Step 3: Compute the test value.

x −  85 .1 − 84 1 .1
z= = =
 n 10 100 10 10
1 .1
z= = 1 .1
1

187
Step 4: Since the test value, 1.1, falls in the
noncritical region, the decision is not to
reject the null hypothesis.

Step 5: There is enough evidence to support


the claim of the manufacturer that the
average lifetime of the television sets is
84 months.

188
P - Value
✓ The P-value is the actual probability of
getting the sample mean value or a
more extreme sample mean value in
the direction of the alternative
hypothesis if the null hypothesis is true.
✓ The P-value is the actual area under
the distribution curve representing the
probability of a particular sample mean
or a more extreme sample mean
occurring if the null hypothesis is true.

189
✓ For example, suppose the null hypothesis
is H0:  = 50 and the mean of the sample
is x = 52. if the computer printed a
P-value of 0.0356 for a statistical test,
then the probability of getting a sample
mean of 52 or greater is 0.0356 if the true
population mean is 50.
✓ What is the relationship between the
P-value and the  value?

190
Area = 0.05

Area = 0.0356

Area = 0.01

50 52

✓ For P=0.0356, the null hypothesis will be


rejected at =0.05 but not at =0.01.
✓ When the hypothesis test is two-tailed, the
area corresponding to the P-value must
be doubled.

191
SMALL SAMPLE MEAN
TEST

192
t Distribution

The t distribution is similar to the standard


normal distribution in the following ways:
1. It is bell-shaped.
2. It is symmetrical about the mean.
3. The mean, median, and mode are equal to 0
and are located at the center of the
distribution.
4. The curve never touches the x – axis.

193
The t distribution differs from the standard
normal distribution in the following ways:

1. The variance is greater than 1.


2. The t distribution is a family of curves
based on the degrees of freedom, which is
a number related to the sample size.
3. As the sample size increases, the t
distribution approaches the normal
distribution.

194
0

The t Family of Curves

195
t – Test
The t – test is a statistical test for the
mean of a population and is used when the
population is normally or approximately
normally distributed,  is unknown, and
n<30.
x−
t=
s/ n

The degrees of freedom are d.f.= n – 1.

196
Example 1: Find the critical t value for
=0.05 with d.f.=16 for a right-tailed test.


df 0.10 0.05 0.025 0.01 0.005
1
2
3
:
16 1.746
:

The critical t value is 1.746.

197
Example 2: Find the critical t value for
=0.01 with d.f.=22 for a left-tailed test.


df 0.10 0.05 0.025 0.01 0.005
1
2
3
:
22 2.508
:

The critical t value is -2.508 since


the test is a left-tailed test.
198
Example 3: Find the critical t values for
=0.10 with d.f.=18 for a two-tailed test.


df 0.10 0.05 0.025 0.01 0.005
1
2
3
:
18 1.734
:

The critical values are 1.734 since


the test is a two-tailed test.
199
PROPORTION TEST

200
Formula for the z Test for Proportions

X− X - np
z= or z =
 npq

where:  = np and  = npq


X = number of successes
n = number of trials (sample size)
p = numerical probability of success
q = numerical probability of a failure

201
Example 1
A recent study claimed that less
than 15% of all eighth-grade students are
overweight. In a sample of 80 students, 9
were found to be overweight. At =0.05, is
there enough evidence to support the
claim?

202
Step 1: The hypotheses are
H0 : p = 0.15 H1 : p < 0.15 (claim)

Step 2: Find the mean and the standard


deviation.
 = np = (80 )(0.15 ) = 12
 = npq = (80 )(0.15 )(0.85 ) = 3.19
Step 3: Since α = 0.05 and the test is left-
tailed test, the critical value is -1.645.

203
Step 4: Compute the test value.

X −  9 − 12 −3
z= = = = −0.94
 3.19 3.19

Step 5: Do not reject the null hypothesis


since the test value, -0.94, falls in the
noncritical region.

Step 6: There is no enough evidence to


support the claim that less than 15% of
all eighth- grade students are overweight.

204
Example 2
Experts claim that 10% of murders
are committed by women. Is there enough
evidence to support the claim if in a
sample of 75 murders, 12% were
committed by women? Use =0.05.

205
Step 1: The hypotheses are
H0 : p = 0.10 (claim) H1 : p  0.10

Step 2: Find the mean and the standard


deviation.
 = np = (75 )(0.10 ) = 7.5
 = npq = (75 )(0.10 )(0.90 ) = 2.60
Step 3: Since α = 0.05 and the test is a two-
tailed test, the critical values are 1.96.

206
Step 4: Compute the test value.
Note: X = (75)(0.12) = 9

X −  9 − 7.5 1.5
z= = = = 0.58
 2.60 2.60
Step 5: Do not reject the null hypothesis
since the test value, 0.58, falls in the
noncritical region.
Step 6: There is enough evidence to support
the claim that 10% of murders are committed
by women.

207
Tests for proportions can also be
conducted by using an equivalent formula
below.
p̂ − p
z=
pq n

where: p̂ = X n , the sample proportion


p = hypothesized population proportion
n = sample size

208
VARIANCE OR STANDARD
DEVIATION TEST

209
Chi-square Distribution

✓ The chi-square distribution is used to test a


claim about a single variance or standard
deviation.
✓ The chi-square variable is similar to the t
variable in that its distribution is a family of
curves based on the number of degrees of
freedom.
✓ The symbol for chi-square is 2 (Greek letter
chi).

210
d.f.=1
d.f.=4
d.f.=9

d.f.=15

2

The Chi-Square Family of Curves

211
Formula for the Chi-Square Test for a
Single Variance

2 =
(n − 1)s 2

2

with d.f.= n – 1 and where


n = sample size
s2 = sample variance
2 = population variance

212
Assumptions for the Chi-Square Test
for a Single Variance

1. The sample must be randomly selected


from the population.
2. The population must be normally
distributed for the variable under study.
3. The observations must be independent of
each other.

213
Example 1
An instructor wishes to see whether
the variation in scores of 23 students in
her class is less than the variance of the
population. The variance of the class is
198. Is there enough evidence to support
the claim that the variation of the students
is less than the population variance
(2=225) at =0.05? Assume that the
scores are normally distributed.

214
Step 1: The hypotheses are
H0 : 2 = 225 H1 : 2 < 225 (claim)

Step 2: Since the test is left-tailed test and


α=0.05, use the value 1 – 0.05 = 0.95.
The degrees of freedom are
d.f. = n – 1 = 23 – 1 = 22.
Thus, the critical value is 12.338.

215
Step 3: Compute the test value.

 =
2 (n − 1)s 2
=
(23 − 1)(198 )
= 19 .36
2 225

Step 4: Do not reject the null hypothesis


since the test value, 19.36, falls in the
noncritical region.

Step 5: There is not enough evidence to


support the claim that the variation in
test scores of the students is less than
the variation in scores of the population.

216
Notes for the Use of Chi-Square

✓ When a two – tailed test is conducted, the area


must be split. For example, if =0.05, the area
to the right of the larger value is 0.025 (0.05/2),
and the area to the right of the smaller value is
0.975 (1.00 – 0.05/2).
✓ After the degrees of freedom reach 30, the
table gives values only for multiples of 10 (40,
50, 60, etc.). When the degrees of freedom is
greater than 30, the closest smaller value
should be used. For example, if d.f.=38, use the
table value for 30 degrees of freedom.

217
Types of Samples
Independent Samples – These are
samples that are randomly selected from
distinct populations. The sample sizes
may or may not be equal.

218
Examples of independent samples

1. sample of male students and sample of


female students
2. sample of smokers and sample of
nonsmokers
3. sample of teachers, sample of parents,
and sample of pupils

219
Dependent Samples – These samples
usually arise in experimental designs
where the objective is to make sure that
the subjects being compared are
comparable in terms of relevant variables.
– These experimental designs are
repeated measures designs (e.g. pretest-
posttest design) and matched groups
design.
– The sample sizes of the groups are
always equal.

220
Pretest – Posttest Design

221
Matched Groups Design

222
TESTING THE DIFFERENCE
BETWEEN TWO MEANS:
Large Independent Samples

223
Assumptions for the Test to Determine the
Difference between Two Means

✓ The samples must be independent of each


other.
✓ The populations from which the samples were
obtained must be normally distributed, and the
standard deviations of the variable must be
known, or the samples must be greater than or
equal to 30.

224
Formula for the z Test for Comparing Two
Means from Independent Samples

z=
(x 1 )
− x 2 − (1 −  2 )
12 22
+
n1 n2

where x1 − x 2 is the observed difference


and 1 - 2 is the expected
difference

225
If 12 and 22 are not known, the
formula below can be used provided that
n1  30 and n2  30

z=
(x 1 )
− x 2 − (1 −  2 )
s12 s22
+
n1 n2

226
Example 1

A statistician claims that the average


score on a standardized test of students who
major in psychology is greater than that of
students who major in mathematics. The results
of the test is shown below. Is there enough
evidence to support the statistician’s claim at
=0.01?

Psychology Mathematics
x1 = 118 x 2 = 115
1 = 15 2 =15
n1 = 50 n2 = 50

227
Step 1: The hypotheses are
H0 : 1 = 2 H1 : 1 > 2 (claim)

Step 2: Since α = 0.01 and the test is a right-


tailed test, the critical value is +2.33.
Step 3: Compute the test value.

z=
(x 1 )
− x 2 − (1 −  2 )
=
(118 − 115 ) − 0
12  22 15 2 15 2
+ +
n1 n2 50 50

3−0 3 3
z= = = =1
225 225 9 3
+
50 50

228
Step 4: Do not reject the null hypothesis
since the test value, 1.0, falls in the
noncritical region.

Step 5: There is not enough evidence to


support the statistician’s claim that
students who major psychology have
higher average score on a
standardized test than those who
major in mathematics.

229
Example 2
In a study of women science majors, the
following data on a self-esteem questionnaire
were obtained on two groups, those who left their
profession within a few months after graduation
(leavers) and those who remained on their
profession (stayers). At =0.05, can it be
concluded that there is no difference in the self-
esteem scores of the two groups?

Leavers Stayers
x1 = 3.05 x 2 = 2.96
s1 = 0.75 s2 =0.75
n1 = 103 n2 = 225

230
Step 1: The hypotheses are
H0 : 1 = 2 (claim) H1 : 1  2

Step 2: Since α = 0.05 and the test is a two-


tailed test, the critical values are 1.96 .

231
Step 3: Compute the test value.

z=
(x 1 )
− x 2 − (1 −  2 )
=
(3.05 − 2.96 ) − 0
s12 s22
+
(0.75 ) + (0.75 )
2 2

n1 n2 103 225

0.09 − 0 0.09
= =
0.5625 0.5625 0.0055 + 0.0025
+
103 225
0.09 0.09
= =
0.008 0.089

z = 1.01

232
Step 4: Do not reject the null hypothesis
since the test value, 1.01, falls in the
noncritical region.

Step 5: There is no significant difference


between the self-esteem scores of the
two groups.

233
TESTING THE DIFFERENCE
BETWEEN VARIANCES

234
Characteristics of the F Distribution

1. The values of F cannot be negative, because


variances are always positive or zero.
2. The distribution is positively skewed.
3. The mean value of F is approximately equal
to 1.
4. The F distribution is a family of curves based
on the degrees of freedom of the variance of
the numerator and the variance of the
denominator.

235
Assumptions for Testing the Difference
between Two Variances

✓ The populations from which the samples were


obtained must be normally distributed. (Note:
The test, F test, should not be used when the
distributions depart from normality.)
✓ The samples must be independent from each
other.

236
Formula for the F Test

s12
F= 2
s2

where s12 is the larger of the two variances.

The F test has two terms for the


degrees of freedoms: that of the numerator,
n1 – 1, and that of the denominator, n2 – 1,
where n1 is the sample size from which the
larger variance was obtained.

237
Example 1: Find the critical value for a right –
tailed F test when  = 0.01, the degrees of
freedom for the numerator (d.f.N.) are 10, and
the degrees of freedom for the denominator
(d.f.D.) are 18.
 = 0.01
d.f.D. d.f.N.
1 2 … 10 …
1
2
:
18 3.51
:

The critical F value is 3.51.

238
Example 2: Find the critical value for a two –
tailed F test when  = 0.05, d..f.N..=17 and
d.f.D.=24

 = 0.025
d.f.D. d.f.N.
1 2 … 15 …
1
2
:
24 2.44
:

The critical F value is 2.44.

239
Example 3

A researcher claims that there is no


difference on the variance of IQ scores of
women who major psychology and the
variance of the IQ scores of men who major
psychology. A sample of IQ scores of 18
women had a variance of 192, and a sample
of 22 men had a variance of 189. At  = 0.05,
can the researcher conclude that his
hypothesis is correct?

240
Step 1: The hypotheses are
H0 : 12 = 22 (claim) H1 : 12  22

Step 2: Use the 0.025 table since α = 0.05 and


this is a two-tailed test. Degrees of freedom are
d.f.N. = 18 – 1 = 17 and d.f.D. = 22 – 1 = 21
Hence, the critical value is 2.53

Step 3: Compute the test value.

s12 192
F= 2 = = 1.02
s2 189

241
Step 4: Do not reject the null hypothesis
since the test value, 1.02, is less than
the critical value, 2.53.

Step 5: There is enough evidence to support


the researcher’s claim that there is no
significant difference between the
variances of the IQ scores of men and
women who major psychology.

242
Notes for the Use of the F Test
✓ The larger variance should always be placed in
the numerator of the formula.
✓ For a two-tailed test, the  value must be
divided by 2 and the critical value be placed on
the right side of the F curve.
✓ If the standard deviations are given in the
problem, they must be squared for the formula
for the F test.
✓ When the degrees of freedom cannot be found
in the table, the closest value on the smaller
side should be used.

243
TESTING THE DIFFERENCE
BETWEEN TWO MEANS:
Small Independent Samples

244
Formulas for t–Tests
Difference between Two Means – Small
Independent Samples

A. Variances are assumed to be unequal:

t=
(X 1 )
− X2 − (1 −  2 )
s12 s22
+
n1 n2

where the degrees of freedom are equal to the


smaller of n1 – 1 or n2 – 1.

245
Formulas for t–Tests
Difference between Two Means – Small
Independent Samples

B. Variances are assumed to be equal:

t=
(X1 )
− X 2 − (1 −  2 )
 (n1 − 1)s12 + (n2 − 1)s22  1 1 
  + 
 n1 + n2 − 2  n1 n2 

where the degrees of freedom are


equal to n1 + n2 – 2.

246
Example 1
A researcher suggests that male
nurses earn more than female nurses. A
survey of 16 male nurses and 20 female
nurses reports the following data. Is there
enough evidence to support the claim that
male nurses earn more than female nurses?
Use  = 0.05.
Male Female
x1 = S23,800 x 2 = S23,750
s1 = $300 s2 =$250
n1 = 16 n2 = 20

247
Solution
The F test will be used to determine
whether or not the variances are equal. The null
hypothesis is that the variances are equal.
The critical value obtained from the
table for =0.05 is 2.23, using d.f.N.=15 and
d.f.D.=19.
The test value is

F=
s
=
2
1(300 )
=
2
90000
= 1.44
s 2
2(250 ) 62500
2

248
Since 1.44 < 2.23, the decision is do
not reject the null hypothesis and conclude
that the variances are equal. Hence, the
second formula will be used.

249
Step 1: The hypotheses are
H0 : 1 = 2 H1 : 1 > 2 (claim)

Step 2: Since  = 0.05 and the test is one-


tailed right test with equal variances,
the degrees of freedom are
n1+n2–2 = 16+20–2 = 34.
The critical value is 1.645.

250
Step 3: Compute the test value.

t=
(X1 )
− X 2 − (1 −  2 )
 (n1 − 1)s12 + (n2 − 1)s22  1 1 
  + 
 n1 + n2 − 2  n1 n2 

t=
(23,800 − 23,750 ) − 0
 (16 − 1)(300 )2 + (20 − 1)(250 )2  1 1 
  +
 16 + 20 − 2  16 20 
 
50
t=
 (15 )(90000 ) + (19 )(62500 ) 
 (0.0625 + 0.05 )
 34 

251
Step 3: Compute the test value.
50
t=
 1,350,000 + 1,187,500 
 (0.1125 )
 34 
50
t=
 2,537,500 
 (0.1125 )
 34 
50 50
t= =
(74,632 .35 )(0.1125 ) 8,396 .14
50
t= = 0.55
91 .63

252
Step 4: Do not reject the null hypothesis
since 0.55 < 1.645.

Step 5: There is not enough evidence to


support the researcher’s claim that
male nurses earn more than female
nurses.

253
TESTING THE DIFFERENCE
BETWEEN TWO MEANS:
Small Dependent Samples

254
Formulas for t–Test
Difference between Two Means
Small Dependent Samples

D − D
t=
sD
n
with d.f.= n – 1 and where

( D)
2

D=
 D D − n
2

and sD =
n n −1

255
Formulas for t–Test
Difference between Two Means
Small Dependent Samples

D − D
t=
sD
n

where D = mean of the differences of the values


of the pairs of data
D = the expected mean difference value
sD = standard deviation of the differences

256
Example 1
A dietician wishes to see if a person’s
cholesterol level will change if the diet is
supplemented by a certain mineral. Six students
were pretested and then took the mineral
supplement for a six-week period. The results are
shown below. (Cholesterol level is measured in
milligrams/deciliter). Can it be concluded that the
cholesterol level has been changed at =0.10?
Assume that the variable is approx. normally
distributed.
Subject 1 2 3 4 5 6
Before (X1) 210 235 208 190 172 244
After (X2) 190 170 210 188 173 228

257
TESTING THE DIFFERENCE
BETWEEN PROPORTIONS

258
Formula for z Test for Comparing
Proportions

z=
(p̂1 − p̂ 2 ) − (p1 − p 2 )
1 1
p q + 
 n1 n2 

where
X1 + X 2
p= ; q = 1- p
n1 + n2
X1 X2
p̂1 = ; p̂ 2 =
n1 n2

259
Example 1
In a sample of 200 surgeons, 15%
thought the government should control
health care. In a sample of 230 general
practitioners, 20% felt this way. At =0.10,
is there a difference between the
proportions?

260

You might also like