Gec3 - Module 5
Gec3 - Module 5
5.1 Introduction
Often we wish to describe a set of data with a single number, or a small set
of numbers, in such a way that these values will yield enough information about
the content of the data that we can produce a means of generating a similar set
of data from this description.
One manner in which this can be done is by specifying values that describe the
numerical center of the set of data, which may be defined in various ways. They
are measures of the central tendency of the data. We can also describe the data
by how it is dispersed around a particular measure of central tendency. A third
manner in which we can describe data is by how it tends to accumulate with
respect to the central tendency--such as whether it tends to accumulate
immediately to the left or to the right of the numerical center.
Page 1 of 22
central tendency- mean, median and mode will be discussed for ungrouped (raw)
and grouped data. Ungrouped data are raw data and grouped data are raw data
that have been compressed into frequency distribution table for better and easy
understanding.
5.3.1.1 Mean
The arithmetic mean or mean is the most familiar and most widely used
measure in our daily life activities. It is the most reliable value in which all the
values of the variable are taken into consideration. It is also the sum of all data
values divided by the number of values in the data set. The mean of a sample
data set is denoted by and the mean of a population data set by the Greek
letter .
where the
Sample mean: observations and is the number of
observations in the sample
where the
Population mean: observations is the number of observations
in the population
Example 1. Find the mean score of the following sample data set:
Quiz Scores:
Solution.
Steps Actual process and result
Page 2 of 22
Example 2.
What is the mean age in the following set of sample data?
Age ( ) Frequency ( )
Solution.
Steps Actual process and result
Age ( ) Frequency ( )
Age ( ) Frequency ( )
Total
3. Divide by .
The mean age is 17.66.
5.3.1.2 Median
The median is the middle number. It is the value which separates the
largest of data values from the lowest . It is denoted as . To calculate
the median, place data values in number order then find the middle number. If
there is an odd number of values, the number in the middle will be the median.
If there is an even number of values, then the average of the two numbers in the
middle will be the median.
Page 3 of 22
Example 3. Odd number of values:
Find the median of the following set of data.
Solution.
Steps Actual process and result
4. The th value is the median of the In this case, the value, which is 35, is the
th
Solution.
Steps Actual process and result
1. Arrange the observation in ascending
order.
3. Identify the th observation and the In this case, we identify the th observation,
th observation. which is , and the th observation, which is .
Page 4 of 22
Example 5. Ungrouped data in frequency distribution.
Find the median age in the given frequency distribution
Age ( )
Solution.
Steps Actual process and result
Age ( )
1. Find the total frequency , and the
cumulative frequency .
Age ( )
4. Locate in . We know that 18
belongs to the range as
indicated by the of .
Total
Age ( )
5. Find the th observation in the
first column. In the example, the
median age is .
Total
Page 5 of 22
5.3.1.3 Mode
The mode is the data value which appears most frequently in the set. There
might be one or more modes or no mode for every data set. For example, in
the previous data:
Age ( )
3. The mean is unique but cannot be found for categorical data or for open-
ended frequency distributions.
4. The median does not use all the values so it is less affected than the mean
by a few or small data.
6. The mode has the advantage that it can be used to measure nominal data
but it is not unique, there may be more than one mode or none at all.
Page 6 of 22
Learning Activity 1
Direction. Tell whether the following statements describe the Mean, Median,
or Mode
Page 7 of 22
2. Left-Skewed. This type of distribution has few data values that are
much lower than the majority of values in the set. (Tail extends to the
left). Generally, the mean is less than the median (and mode) in a left-
skewed distribution.
3. Right-Skewed. This type of distribution has few data values are much
higher than the majority of values in the set. (Tail extends to the right).
Generally the mean is greater than the median (and mode) in a right-
skewed distribution.
Page 8 of 22
5.3.2 Measures of Dispersion
Dispersion or variation in a data set is the amount of difference between
data values. It tells if the numbers in the data are close together or spread far
apart.
In a data set with little variation, almost all data values would be close to
one another. The histogram of such a data set would be narrow and tall. An
example of this is the set of quiz scores below.
Quiz Scores:
In a data set with a great deal of variation, the data values would be spread
widely. The histogram of this data set would be low and wide. An example is
the set of data that follows.
Quiz Scores:
where
represents the observations
Population variance the population mean
the population size
where
represents the observations
Sample variance the sample mean
the sample size
Page 9 of 22
To find the variance in a set of data, the process is as follows:
Page 10 of 22
Solution.
Steps Actual process and result
1. Determine the mean of the
observations.
Page 11 of 22
The coefficient of variation (CV) makes it easier to tell if a standard deviation
is large or small by comparing the standard deviation to the mean and it allows
comparison of standard deviations that come from data sets with different
means.
For population
For population
1. The -score of a value is positive if the value is above the mean and
negative if it is below the mean. The mean itself always has a -score
of .
Page 12 of 22
Example 7.
Students were selected from two sections and their scores in a Statistics
examination were gathered. The following information were obtained:
Sample mean is .
First section
Sample standard deviation is .
Sample mean is .
Second section
Sample standard deviation is .
Linda, who is from the first section got a score of while her friend, Jessa,
who is in the second section got a score of . Who has a higher standard score?
Solution.
Linda Jessa
Percentiles divide a data set into parts. It can be found for any percent
from to and is denoted as where the subscript is the percentile rank
which indicates the percent of the distribution that falls below the percentile.
For example, is the tenth percentile and is larger than of the distribution.
Example 8. Using the data below, find , and the percentile rank of .
Page 13 of 22
Solution.
a) To find , we follow the steps given:
Page 14 of 22
Another measure of position is the deciles. Deciles divide the data set into
tenths and can be found for through . Deciles are denoted as with a
subscript , for example, is the third decile and is the value that is larger than
three tenths of the other values.
Quartiles divide a data set into fourths and can be found for to . is
the first quartile and is the value that is larger than one fourth of the
observations in the distribution.
Page 15 of 22
Solution.
Steps Actual process and result
1. Arrange the data to
ascending order
Interpretation:
The stem-and-leaf plot shows that most of the students obtained the score
from to .
Example 10. Make a stem-and leaf plot for the following numbers.
Solution.
Steps Actual process and result
1. Arrange the data to
ascending order
Page 16 of 22
3. Use the first digits for the
leading digit (or stem) and Leading Digit Stem
list all the last digits in order
for the trailing digit (or leaf):
Interpretation:
The stem-and-leaf plot shows that most of the students obtained the score
from to .
A BOX-AND-WHISKER PLOT graphs five values of the set of data on a
number line. The five values are:
1. The lowest value in the set of data.
2. The lower hinge.
3. The median.
4. The upper hinge.
5. The highest value of the set of data.
A box is drawn from the lower hinge to the upper hinge and lines are drawn
from the box to the highest and lowest value. The lower hinge is the median of
all the values less than or equal to the median when the set of data set has an
odd number of values, or the median of all values less than the median when the
set of data has an even number of values. The upper hinge is the median of all
values greater than or equal median when the set of data has an odd number of
values, or the median of all values greater than the median when the set of data
has an even number of values.
Example 11. A item test was given to statistics students. The result is
shown below:
Page 17 of 22
Solution.
Steps Actual process and result
1. Arrange the data to
ascending order
2. Determine the five values: The lowest value in the data set is .
The highest value in the data set is .
The median is .
The lower hinge is the midpoint of the numbers
below the median which is .
The upper hinge is the midpoint of the numbers
above the median which is .
3. Set up the horizontal axis
containing the values
obtained in Step 2. In this
case, we start at and end at
with an interval of .
Interpretation:
The box whisker plot shows that the data is not symmetrical and that the
data is positively skewed since the whisker in longer on the right.
Page 18 of 22