0% found this document useful (0 votes)
5 views

Gec3 - Module 5

Module on stat

Uploaded by

Christine Guiyab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Gec3 - Module 5

Module on stat

Uploaded by

Christine Guiyab
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

MODULE 5

Data Management: Measures of Central Tendency,


Dispersion and Position

5.1 Introduction
Often we wish to describe a set of data with a single number, or a small set
of numbers, in such a way that these values will yield enough information about
the content of the data that we can produce a means of generating a similar set
of data from this description.

One manner in which this can be done is by specifying values that describe the
numerical center of the set of data, which may be defined in various ways. They
are measures of the central tendency of the data. We can also describe the data
by how it is dispersed around a particular measure of central tendency. A third
manner in which we can describe data is by how it tends to accumulate with
respect to the central tendency--such as whether it tends to accumulate
immediately to the left or to the right of the numerical center.

There are three ways of describing data, measures of central tendency,


measures of variation and measures of position.
The measures of central tendency are the averages and tell about the middle
of the data. The measures of variation tell if the data are close together or spread
far apart. The measures of position tell the relative position of a number in a
given data in comparison with the rest of the numbers.
Data from a population are called parameters while the data from a sample
are called statistics.
5.2 Learning Outcomes

After finishing this module, you are expected to:

1. Describe the measures of central tendency;


2. Compute or obtain the different measures of central tendency; and
3. Select the proper measures of central tendency to use.

5.3 What You Need to Know


5.3.1 Measures of Central Tendency
Measures of central tendency provide us a convenient way of describing a
set of data with a single number. It is a value used to represent the typical or
In this section, three commonly used measures of

Page 1 of 22
central tendency- mean, median and mode will be discussed for ungrouped (raw)
and grouped data. Ungrouped data are raw data and grouped data are raw data
that have been compressed into frequency distribution table for better and easy
understanding.
5.3.1.1 Mean
The arithmetic mean or mean is the most familiar and most widely used
measure in our daily life activities. It is the most reliable value in which all the
values of the variable are taken into consideration. It is also the sum of all data
values divided by the number of values in the data set. The mean of a sample
data set is denoted by and the mean of a population data set by the Greek
letter .

where the
Sample mean: observations and is the number of
observations in the sample

where the
Population mean: observations is the number of observations
in the population

Example 1. Find the mean score of the following sample data set:
Quiz Scores:
Solution.
Steps Actual process and result

1. Find the sum

2. Divide the sum by the number


of observations. In this case,
there are 11. Thus, the sample mean is .

The mean for ungrouped data in a frequency distribution is found by


multiplying the values by the frequency for each set of number, adding all the
products, and dividing by the total number of frequencies.

where is the frequency of each value , the


Sample mean:
is the number of observations in the
sample

where is the frequency of each value , the


Population mean: is
the number of observations in the population

Page 2 of 22
Example 2.
What is the mean age in the following set of sample data?

Age ( ) Frequency ( )

Solution.
Steps Actual process and result

Age ( ) Frequency ( )

1. Find the product of and

Age ( ) Frequency ( )

2. Add values under column of ,


which is and under column
of , which .

Total

3. Divide by .
The mean age is 17.66.

5.3.1.2 Median
The median is the middle number. It is the value which separates the
largest of data values from the lowest . It is denoted as . To calculate
the median, place data values in number order then find the middle number. If
there is an odd number of values, the number in the middle will be the median.
If there is an even number of values, then the average of the two numbers in the
middle will be the median.

Page 3 of 22
Example 3. Odd number of values:
Find the median of the following set of data.

Solution.
Steps Actual process and result

1. Arrange the observation in ascending


order.

2. Since the number of values is odd,


find the number of observations plus There are 9 values. Here,
1(

3. Divide by . The number that


will result, , will tell us the place of
the median in the ordered array.

4. The th value is the median of the In this case, the value, which is 35, is the
th

set of data. median.

Example 4. Even number of values.


Find the median of the following set of data:

Solution.
Steps Actual process and result
1. Arrange the observation in ascending
order.

2. Since the number of values is even,


find half the number of observations.

3. Identify the th observation and the In this case, we identify the th observation,
th observation. which is , and the th observation, which is .

4. Find the mean of the th


observation and th The median is given by
observation. The number that result
is the median of the set of data.

Page 4 of 22
Example 5. Ungrouped data in frequency distribution.
Find the median age in the given frequency distribution

Age ( )

Solution.
Steps Actual process and result

Age ( )
1. Find the total frequency , and the
cumulative frequency .

Note: Make sure that the entries in the


first column are in order.
Total

2. Obtain the column for the cumulative


frequency . To do this, copy the Age ( )
first entry in . In this case, it is .
After this, add the first entry with the
second entry in , .
Repeating the process, we will have
as the third entry. The Total
last entry in must equal .

3. Since is odd, compute . In this case, we have .

Age ( )
4. Locate in . We know that 18
belongs to the range as
indicated by the of .
Total

Age ( )
5. Find the th observation in the
first column. In the example, the
median age is .
Total

Page 5 of 22
5.3.1.3 Mode
The mode is the data value which appears most frequently in the set. There
might be one or more modes or no mode for every data set. For example, in
the previous data:

The mode is 32 which is repeated two times.


The mode for an ungrouped data is the value that has the most frequencies.
For example, in the data below, the mode is years old.

Age ( )

5.3.1.4 Properties of Mean, Median, and Mode


1. Mean is the most commonly used measure of central tendency.

2. One drawback of the mean is that it is heavily influenced by a few very


high or very low data values. In these cases, it is more common to use the
median.

3. The mean is unique but cannot be found for categorical data or for open-
ended frequency distributions.

4. The median does not use all the values so it is less affected than the mean
by a few or small data.

5. The median is unique and can be found for open-ended frequency


distributions.

6. The mode has the advantage that it can be used to measure nominal data
but it is not unique, there may be more than one mode or none at all.

Page 6 of 22
Learning Activity 1

Direction. Tell whether the following statements describe the Mean, Median,
or Mode

1. The most preferred descriptive measure


in a skewed distribution
2. Will have the largest value in a
negatively skewed distribution
3. Will have the largest value in a
positively skewed distribution.
4. The point above and below where half
of the distribution of the data falls.
5. Will have the same value in a bimodal
distribution
6.
7. Is equivalent to the 50th percentile of a
distribution?
8. The most popular score in a
distribution
9. Influenced by the specific value of every
observation
10. Is most appropriate to use when
extreme scores are given.

5.3.1.5 Shapes of Data Distributions

1. Symmetric. In this case, the data distribution is approximately the same


shape on either side of a central dividing line. The mean and median
(and mode if unimodal) are equal in a symmetric distribution. A
symmetrical data is bell-shaped and can be called normal.

Page 7 of 22
2. Left-Skewed. This type of distribution has few data values that are
much lower than the majority of values in the set. (Tail extends to the
left). Generally, the mean is less than the median (and mode) in a left-
skewed distribution.

3. Right-Skewed. This type of distribution has few data values are much
higher than the majority of values in the set. (Tail extends to the right).
Generally the mean is greater than the median (and mode) in a right-
skewed distribution.

4. Uniform. This type of distribution has all data values equally


represented.

Page 8 of 22
5.3.2 Measures of Dispersion
Dispersion or variation in a data set is the amount of difference between
data values. It tells if the numbers in the data are close together or spread far
apart.
In a data set with little variation, almost all data values would be close to
one another. The histogram of such a data set would be narrow and tall. An
example of this is the set of quiz scores below.
Quiz Scores:
In a data set with a great deal of variation, the data values would be spread
widely. The histogram of this data set would be low and wide. An example is
the set of data that follows.
Quiz Scores:

5.3.2.1 Common Measures of Dispersion


There are three common measures of dispersion: range, variance, and
standard deviation.
1. Range. It is the difference between the largest and smallest data values
in a data set.

2. Variance. It is the average of squared deviations from the mean of a set


of data. It is calculated using two formulas depending on whether the
data set being considered is a population or a sample data set.

where
represents the observations
Population variance the population mean
the population size

where
represents the observations
Sample variance the sample mean
the sample size

Page 9 of 22
To find the variance in a set of data, the process is as follows:

Procedure for Computing a Variance

1. Determine the mean of the observations.


2. For each observation, calculate the deviation (difference) between each
observation and the mean.
3. Calculate the square of each of the deviations and find the sum of these
squared deviations.
4. If the data is a population, then divide the sum by . If the data is a sample,
then divide the sum by .

3. Standard Deviation. It is the most commonly used measure of variation.

the data set. It is also the square root of the variance.

Population standard deviation

Sample standard deviation

To compute the standard deviation, we simply get the square root of


the variance.
Example 6.
What is the standard deviation in the given sample data?

Page 10 of 22
Solution.
Steps Actual process and result
1. Determine the mean of the
observations.

2. For each observation,


calculate the deviation or
difference between each
observation and the mean.
Because this is a sample data,
we get .

3. Calculate the square of each


of the deviations and find the
sum of these squared
deviations. This means that
we will get and
.

4. If the data is a population,


then divide the sum by . If
the data is a sample, then
divide the sum by .
5. Find the square root of the
variance to get the standard or
deviation.

Page 11 of 22
The coefficient of variation (CV) makes it easier to tell if a standard deviation
is large or small by comparing the standard deviation to the mean and it allows
comparison of standard deviations that come from data sets with different
means.

For population

For the sample

5.3.3 Measures of Position


Measures of position compare the location of a value in a data set in relation
to other values.
The standard score (or -score) of a data value is the number of standard
deviations that the value lies above or below the mean. It measures how many
standard deviations a value is away from the mean. It is used to compare scores
from groups of data with different terms.

For population

For the sample

1. The -score of a value is positive if the value is above the mean and
negative if it is below the mean. The mean itself always has a -score
of .

2. A data value is considered to be unusual if it is more than two standard


deviations from the mean.

3. A data value is unusually high if it has a -score larger than and


unusually low if it has a -score of less than .

Page 12 of 22
Example 7.
Students were selected from two sections and their scores in a Statistics
examination were gathered. The following information were obtained:
Sample mean is .
First section
Sample standard deviation is .
Sample mean is .
Second section
Sample standard deviation is .

Linda, who is from the first section got a score of while her friend, Jessa,
who is in the second section got a score of . Who has a higher standard score?
Solution.
Linda Jessa

Since , we conclude that Linda has a higher standard score.

5.3.3.1 Percentiles, Deciles, and Quartiles

Percentiles divide a data set into parts. It can be found for any percent
from to and is denoted as where the subscript is the percentile rank
which indicates the percent of the distribution that falls below the percentile.
For example, is the tenth percentile and is larger than of the distribution.

Example 8. Using the data below, find , and the percentile rank of .

Page 13 of 22
Solution.
a) To find , we follow the steps given:

Steps Actual process and result


1. Arrange the numbers in ascending
order.
2. Find , where is the number
of observations and is the
percentile rank which, in this
example, is .

3. Since is a whole number, get the


average of the 3rd and the 4th
number in the ordered list. This will
be .

Thus, which means that of the observations are


less than 1.5.

b) To find , we will follow a similar process with the previous item.

Steps Actual process and result


1. Arrange the numbers in ascending
order.
2. Find , where is the number
of observations and is the
percentile rank which is .

3. Since is not a whole number, we


round up to 8. Locate the 8th
observation. This will be .

From here, we conclude that of the observations are less than 3.

c) To find the percentile rank of , we use the formula given below:

Page 14 of 22
Another measure of position is the deciles. Deciles divide the data set into
tenths and can be found for through . Deciles are denoted as with a
subscript , for example, is the third decile and is the value that is larger than
three tenths of the other values.

Quartiles divide a data set into fourths and can be found for to . is
the first quartile and is the value that is larger than one fourth of the
observations in the distribution.

5.3.3.2 Exploratory Data Analysis


Exploratory data analysis is used to examine data to find out what can be
discovered about the data. Two methods to present for exploratory data analysis
are stem-and-leaf plot and box plot.
A STEM-AND-LEAF PLOT uses the first digit (or digits) as the stem and the
last digit as the leaf to form group of classes.
Example 9. A 100 item test was given to 25 statistics students. The result is
shown below:

Make a stem-and-leaf plot of the above data.

Page 15 of 22
Solution.
Steps Actual process and result
1. Arrange the data to
ascending order

2. Separate the data according


to classes using the first digit
to separate the classes.

3. Use the first digit for the


leading digit (or stem) and Stem Leaf
list all the last digits in order
for the trailing digit (or leaf):

Interpretation:
The stem-and-leaf plot shows that most of the students obtained the score
from to .
Example 10. Make a stem-and leaf plot for the following numbers.

Solution.
Steps Actual process and result
1. Arrange the data to
ascending order

2. Separate the data according


to classes using the first digit
to separate the classes.

Page 16 of 22
3. Use the first digits for the
leading digit (or stem) and Leading Digit Stem
list all the last digits in order
for the trailing digit (or leaf):

Interpretation:
The stem-and-leaf plot shows that most of the students obtained the score
from to .
A BOX-AND-WHISKER PLOT graphs five values of the set of data on a
number line. The five values are:
1. The lowest value in the set of data.
2. The lower hinge.
3. The median.
4. The upper hinge.
5. The highest value of the set of data.

A box is drawn from the lower hinge to the upper hinge and lines are drawn
from the box to the highest and lowest value. The lower hinge is the median of
all the values less than or equal to the median when the set of data set has an
odd number of values, or the median of all values less than the median when the
set of data has an even number of values. The upper hinge is the median of all
values greater than or equal median when the set of data has an odd number of
values, or the median of all values greater than the median when the set of data
has an even number of values.
Example 11. A item test was given to statistics students. The result is
shown below:

Page 17 of 22
Solution.
Steps Actual process and result
1. Arrange the data to
ascending order

2. Determine the five values: The lowest value in the data set is .
The highest value in the data set is .
The median is .
The lower hinge is the midpoint of the numbers
below the median which is .
The upper hinge is the midpoint of the numbers
above the median which is .
3. Set up the horizontal axis
containing the values
obtained in Step 2. In this
case, we start at and end at
with an interval of .

4. Draw the boxplot.


a. Draw a vertical segment
on the lowest value
and highest value as
shown.
b. Draw vertical lines on the
median lower hinge
, and upper hinge
and form a box as
shown.
c. Draw horizontal
segments as illustrated.

Interpretation:
The box whisker plot shows that the data is not symmetrical and that the
data is positively skewed since the whisker in longer on the right.

5.4 Supplementary Learning Resources


Descriptive statistics calculator
https://www.calculatorsoup.com/calculators/statistics/descriptivestatis
tics.php

Stem and leaf plotter


https://www.calculatorsoup.com/calculators/statistics/stemleaf.php
Boxplot Generator
https://www.desmos.com/calculator/h9icuu58wn

Page 18 of 22

You might also like