0% found this document useful (0 votes)
44 views48 pages

Stat I Chapter 3

Uploaded by

Natnael Tesfaye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views48 pages

Stat I Chapter 3

Uploaded by

Natnael Tesfaye
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Chaper 3

Measures of Central Tendancy and


Dispersion
(Summarizing Data)
What is the chapter about?
• This chapter is concerned with two numerical
ways of describing quantitative variables.
• Namely, measures of central tendency also
known as measures of location and measures of
dispersion.
• Measures of central tendency are often
referred to as averages. The purpose of a
measure of location is to pinpoint the center of
a distribution of data.
• Measures of dispersion—often called the
variation or the spread—in the data tells us
about the spread of the data.
• A small value for a measure of dispersion
indicates that the data are clustered closely,
say, around the arithmetic mean. The mean is
therefore considered representative of the data.
• Conversely, a large measure of dispersion
indicates that the mean is not reliable.
3.1. Summation Notation
There is a convenient notation for the sum of the
terms of a finite sequence. It is called summation
notation or sigma notation because it involves the
use of the uppercase Greek letter sigma, written as
S.
E.g. Sa = Summation of all values of a.
Example
From the table compute:
1. Sx
2. S (x-4) x y
3. Sx2 4 11
4. Sxy
2 5
5. (Sx)2
9 8
6. S2x
7. S (x+y)
3.2. Measures of central tendency
• Types of measures of central tendency:
1. Mean
– Arthemetic
– Weighted
2. Median
3. Mode
Arthemetic mean:
• Population mean; µ = Sx/N
• Sample mean; x (x-bar) = Sx/n
• Example 1;
– Find the µ of this data for length (in cm) of 6
different garments 54.5, 55.0, 55.7, 51.8, 54.2, 52.4.
– µ = 54.5 + 55.0 + 55.7 + 51.8 + 54.2 + 52.4 = 53.6
6
• Example 2;
– Find the µ of the profit per order for small, medium
and large orders $1, $3, and $6 respectively.
• $3.33
Weigthed mean – unlike arthemetic mean, here the data
items do not have equal importance, i.e., do not have
equal weight.
• Example, in the previous example, suppose the number
of times the small, medium and large orders occurred
are different.
Order Proft per Number of pw
size order (p) orders (w)

Small $1 120 $120


Medium 3 60 180
Large 6 20 120
Sw = 200 S(pw) = $420

• Weighted mean = S(pw)/ Sw = $420/ 200 = $2.10


Approximating mean for Grouped Data
• When data are grouped, their average can be
apporximated by assuming the average of the
numbers in a class is equal to the class mark.
• Class mark – the average of the class limits.
Variable Frequency (f) Class mark fx
(x)

4–5 3 4.5 13.5

5–6 11 5.5 60.5

6–7 17 6.5 110.5

7–8 16 7.5 120.0

8–9 8 8.5 68.0

9 – 10 5 9.5 47.5

60 420.0
• µ ≈ S(fx)/ Sf ≈ 420/60 ≈ 7.0
• The ≈ sign shows equals approximately,
because accurate average cannot be
computed from the frequency table.
• Frequency is the same as weight.
The Table below shows the age distribution
of 50 people. Calculate their mean age.
Class Limits Frequancy (f)

42 – 48 8
49 – 55 8
56 – 62 13

63 – 69 7
70 – 76 6
77 – 83 5

84 – 90 3
The Median
• He median is a number selected to represent
the middle position when the data are arrayed
in order of size.
• Median = @Position (N+1)/2
• If N is odd, there is a single data item in the
middle.
• If N is even, we take the average of the two
middle data items as the median.
• Example:
– Determine the middle Position for the data 8,
9, 10, 15, 20
• Median position = (5+1)/2 = 3 (3rd position)
– Determine the middle position for the data
13, 14, 16, 20, 25, 30.
• Median position = (6+1)/2 = 3.5 (average of the
3rd and 4th position).
– How would the median of a set of 1000 data
items be selected?
Approximating the median for
Grouped Data
• To find the approximate median of a grouped
data,
– We first find the median class (the class that
contains the median value)
– Median class = the first class where the less than
cumulative frequecy equals or exceeds N/2.
Find the approximate median value – Example 1
• The median class:
– N/2 = 440/2 = 220.
– The less than CF 310 is the first to exceed 220.
– So, the third class is the median class.
Hourly wage rate No. of workers (f) Less than CF

3.00 – 3.49 68 68
3.50 – 3.99 142 210
4.00 – 4.49 100 310
4.50 – 4.99 60 370
5.00 – 5.49 40 410
5.50 – 5.99 20 430
6.00 – 6.49 10 440
N = 440
• The cumulative frequncy of the previous class
(previous to the median class) is 210.
• So we need 220 – 210 = 10 data values from the
median class to reach 220th value.
• The median class contains 100 data values (f).
• We require 10 of these 100 values to get to the 220th
value.
• So, we move along from the LCL of the median
class, 4.00 a distance which is 10/100 of the class
interval (0.50).
• The median ≈ 4.00 + 10/100 (0.50) ≈ 4.00 +0.05 ≈
4.05
• The median ≈ LCLmed + r/f med(cw)
• LCLmed = Lower class limit of the median class
• r = the number of data items required to reach
N/2 from the median class.
• fmed = the median class frequency
• cw = the class interval (width)
Find the approximate median value – Example 2

Score Frequency

40 – 49 5

50 – 59 18

60 – 69 27

70 – 79 15

80 – 89 6
The Mode
• The mode of a set of data is the value that
occurs most frequenctly.
• For example; 26, 28, 28, 28, 30, 30, 32.
• The mode is 28.
• A data set can be bimodal (have two modes) or
multimodal (have more than two modes).
Mode for Grouped Data
• We assign mode for grouped data that has a
highest frequency class.
• We use the frequency of the modal class and the
two adjacent classes to the locate the mode in the
modal class.
• If the modal classes have equal frequency, we
assume the mode is at the middle of the modal
class.
• If one of the adjacent classes has a higher
frequency than the other, we assume the mode is
proportionately closer to that class.
Find the mode – Example 1
• The modal class:
– The highest frequency = 142
– The class with the highest frequency is the modal class.
– So, the 2nd is the modal class.
– We expect the mode value is proptionately closer to the 3rd
class.
Hourly wage rate No. of workers (f)

3.00 – 3.49 68
3.50 – 3.99 142
4.00 – 4.49 100
4.50 – 4.99 60
5.00 – 5.49 40
5.50 – 5.99 20
6.00 – 6.49 10
• The mode = LCLmo + d1/d1+d2 (cw)
• LCLmo = Lower class limit of the modal
class.
• d1 = modal class frequency – the frequency
in the previous class; (142 – 68) = 74
• d2 = modal class frequency – the frequency
in the next class; (142 – 100) = 42
• cw = the class interval (class width); 0.5
• The mode = 3.50 + 74/74+42 (0.5)
• The mode = 3.50 + 0.32 = 3.82
Find the modal age for the age distribtution of 228
patients – Example 2

Class Frequency
15 – 19 6
20 – 24 19
25 – 29 50
30 – 34 57
35 – 39 48
40 – 44 27
45 – 49 21
Total 228
Compute the Mean, the Median and the Mode
Values Frequency
141 – 150 17
151 – 160 29
161 - 170 42
171 – 180 72
181 – 190 84
191 – 200 107
201 – 210 49
211 – 220 34
221 – 230 31
231 – 240 16
241 - 250 12
Skewness

Remarks:

In a positively skewed distribution, smaller observations are more frequent than


larger observations. i.e. the majority of the observations have a value below an
average.
In a negatively skewed distribution, smaller observations are less frequent than
larger observations. i.e. the majority of the observations have a value above an
average.
Measures of Position
• Measures of position are measures that indicate the
location, or position, of a value relative to the entire
set of data.
• Percentile:
– The Pth percentile is a value such that
approximately P% of the observations are at or
below that number.
– Percentiles separate large ordered data sets into
100ths.
– The 50th percentile is the median.
Calculating Percentile
To illustrate the computation of the Salary Postition

5710 1
Pth percentile, let us compute the
5755 2
80th percentile for the salary 5850 3

data in the Table. 5880 4


5880 5

5890 6
Pth percentile = P/100(N+1) 5920 7

Using the equation with 5940 8

5950 9
P = 80 and N = 12, the
6050 10
location of the 80th percentile is 6130 11
P80 = 80/100 (12+1) = 10.4 6325 12
• The interpretation of P80 = 10.4 is that the 80th
percentile is 40% of the way between
– the value in position 10 and the value in position
11.
• In other words, the 80th percentile is
– the value in position 10 (6050) plus .4 times the
difference between the value in position 11 (6130)
and the value in position 10 (6050).
• Thus, the 80th percentile is:
• 80th percentile = 6050 + .4(6130 – 6050) =
6050 + .4(80) = 6082
• Let us now compute the 50th percentile for the
starting salary data. With P = 50 and N = 12, the
location of the 50th percentile is
• P50 = 50/100(12+1) = 6.5
• With P50 = 6.5, we see that the 50th percentile is
• 50% of the way between the value in position 6
(5890) and the value in position 7 (5920).
• Thus, the 50th percentile is
• = 5890 + .5(5920 – 5890) = 5890 + .5(30) = 5905
• Note that the 50th percentile is also the median.
• Quartiles:
• Quartiles are descriptive measures that
separate large data sets into four quarters.
• Q1 = first quartile, or 25th percentile
• Q2 = second quartile, or 50th percentile
(also the median)
• Q3 = third quartile, or 75th percentile
Calculating Quartile
• For Q1, P25 = 25/100 (12+1) = 3.25
• The first quartile, or 25th percentile, is .25 of the way
between the value in position 3 (5850) and the value in
position 4 (5880).
• Thus, Q1 = 5850 + .25(5880 – 5850) = 5850 + .25(30) =
5857.5
• For Q3, P75 = 75/100 (12+1) = 9.75
• The third quartile, or 75th percentile, is .75 of the way
between the value in position 9 (5950) and the value in
position 10 (6050).
• Thus, Q3 = 5950 + .75(6050 – 5950) = 5950 + .75(100) =
6025
• Find Q2
Measures of Dispersion/Variation
• The scatter or spread of items of a distribution
is known as dispersion or variation.
• In other words the degree to which numerical
data tend to spread about an average value is
called dispersion or variation of the data.
• Measures of dispersion are statistical measures
which provide ways of measuring the extent in
which data are dispersed or spread out.
Example:
• Consider the following two sets of scores:
– Set 1: 40, 50, 60, 60, 40, 50; µ = 50
– Set 2: 0,100, 25, 75, 80, 20; µ = 50
– The same mean, but different dispersion.
Why study dispersion/variation?

• To determine the reliability of an average by


pointing out as how far an average is
representative of the entire data.
• Enable comparison of two or more
distribution with regard to their variability.
Types of Variation
1. Range
2. Interquartile range
3. Mean deviation
4. Variance
5. Standard Deviation
6. Coefficient of Variation
1. Range is
• Maximum value – Minimum value in a data
set.
• For a grouped data, the range is the difference
between the UCB of the last class and the LCB
of the first class.
2. Interquartile range
• The difference between the third quartile, Q3, and the
first quartile Q1, of the data set.
• Interquartile range = Q3 - Q1
• Q1 = (1/4)(N+1)
• Q3 = (3/4)(N+1)
• Example: 2710, 2755, 2850, 2880, 2880, 2890, 2920,
2940, 2950, 3050, 3150, 3325
!
• Q3 = (12+1) = 9.75.
"
• Q3 = 2950 + .75(3050 - 2950) = $3025
#
• Q1 = "
(12+1) = 3.25.
• Q1 = 2850 + .25(2880 – 2850) = $2857.5.
• Q3 – Q1 = 3025 – 2857.5 = 167.5.
3. Mean Deviation
• Mean deviation is the arithmetic mean of the
absolute value of the deviations from the arithmetic
mean.
• Example: Consider the data 3, 4, 0, 8, 2, 9, 6, 2, and
11; µ = 5
• To calculate the mean deviation we follow the
following steps:
– Step-1: The mean is subtracted from each value to get the
deviations from the mean.
– Step-2: The absolute value of the deviations is summed.
– Step-3: The sum of the absolute deviations is divided by the
total number of values
S/Xi- µ/ = 28
Mean deviation = 28/9 = 3.1

Values (Xi) 3 4 0 8 2 9 6 2 11
Deviations /3-5/ /4-5/ /0-5/ /8-5/ /2-5/ /9-5/ /6-5/ /2-5/ /11-5/
from mean = 2 =1 =5 =3 =3 =4 =1 =3 =6
under
absolute value
/Xi- µ/
4.Variance and Standard Deviation
• The variance and standard deviation are the
most superior and widely used measures of
dispersion.
• Variance = 𝜎2 = S(x - µ)2/N
• Standard deviation = 𝜎 = 𝜎 2
Example: 24, 25, 29,29,30,31 Find variance and standard deviation.
𝜎2 = S(x - µ)2/N = 40/6 = 6.67
𝜎= 𝜎 = 2 6.67 = 2.58

Value value-mean(xi − X̄ ) difference Difference square(Xi − X̄ )2


24 24-28 -4 16
25 25-28 -3 9
29 29-28 1 1
29 29-28 1 1
30 30-28 2 4
31 31-28 3 9
total 0 40
5. Coefficient of Variation
• Coefficient of variation (CV) is the proportion of the
standard deviation to the mean.
• CV = Standard Deviation x 100
Mean
• Interpretation of the coefficient of variation: the
distribution having less CV is said to be less variable
or more consistent.
– Example 4.8 Suppose that the mean weight of a group of
students is 165 pounds with a S.D of 8 pounds. If the height
of the same group of students has a mean of 60 inches with
a S.D of 3 inches, compare the variability in weight and
height measurements. Which data (height or weight) is more
variable and less consistent?
Exercise Questions
1. The average salary of the president and four vice
presidents of a company is $50,000. If vice
presidents’ average salary is $45,000, what is
the president’s salary?
2. Find the mean, median, mode, variance and
standard deviation of the following sample data.

Class 40-44 45-49 50-54 55-59 60-64 65-69 70-74

Frequ 7 10 22 15 12 6 3
ency
3. Compute the mean deviation of the following
data.
Price/kg Units ordered

20 – 29 2

30 – 39 12

40 – 49 15

50 – 59 20

60 – 69 18

70 – 79 10

80 – 89 9

90 – 99 4
4. Two distributions A and B have mean 80 and 20
inches and standard deviations 10 and 15 inches
respectively. Which distribution has more unlike
elements?
5. Suppose that the following data show the ratings of
hard-shell jackets based on the breathability, durability,
versatility, features, mobility, and weight of each jacket.
The ratings range from 0 (lowest) to 100 (highest).
42, 66, 67, 71, 78, 62, 61, 76, 71, 67,
61, 64, 61, 54, 83, 63, 68, 69, 81. 53
a. Compute the mean, median, and mode.
b. Compute the first and third quartiles.
c. Compute and interpret the 90th percentile.
Correction for the first exercise (Q. 1)
• Please replace the table in Question 1 with this
table.
Class intervals Frequency

9.6 – 14.5 10

14.6 – 19.5 20

19.6 – 24.5 30

24.6 – 29.5 25

29.6 – 34.5 15

You might also like