Stat I Chapter 3
Stat I Chapter 3
9 – 10 5 9.5 47.5
60 420.0
• µ ≈ S(fx)/ Sf ≈ 420/60 ≈ 7.0
• The ≈ sign shows equals approximately,
because accurate average cannot be
computed from the frequency table.
• Frequency is the same as weight.
The Table below shows the age distribution
of 50 people. Calculate their mean age.
Class Limits Frequancy (f)
42 – 48 8
49 – 55 8
56 – 62 13
63 – 69 7
70 – 76 6
77 – 83 5
84 – 90 3
The Median
• He median is a number selected to represent
the middle position when the data are arrayed
in order of size.
• Median = @Position (N+1)/2
• If N is odd, there is a single data item in the
middle.
• If N is even, we take the average of the two
middle data items as the median.
• Example:
– Determine the middle Position for the data 8,
9, 10, 15, 20
• Median position = (5+1)/2 = 3 (3rd position)
– Determine the middle position for the data
13, 14, 16, 20, 25, 30.
• Median position = (6+1)/2 = 3.5 (average of the
3rd and 4th position).
– How would the median of a set of 1000 data
items be selected?
Approximating the median for
Grouped Data
• To find the approximate median of a grouped
data,
– We first find the median class (the class that
contains the median value)
– Median class = the first class where the less than
cumulative frequecy equals or exceeds N/2.
Find the approximate median value – Example 1
• The median class:
– N/2 = 440/2 = 220.
– The less than CF 310 is the first to exceed 220.
– So, the third class is the median class.
Hourly wage rate No. of workers (f) Less than CF
3.00 – 3.49 68 68
3.50 – 3.99 142 210
4.00 – 4.49 100 310
4.50 – 4.99 60 370
5.00 – 5.49 40 410
5.50 – 5.99 20 430
6.00 – 6.49 10 440
N = 440
• The cumulative frequncy of the previous class
(previous to the median class) is 210.
• So we need 220 – 210 = 10 data values from the
median class to reach 220th value.
• The median class contains 100 data values (f).
• We require 10 of these 100 values to get to the 220th
value.
• So, we move along from the LCL of the median
class, 4.00 a distance which is 10/100 of the class
interval (0.50).
• The median ≈ 4.00 + 10/100 (0.50) ≈ 4.00 +0.05 ≈
4.05
• The median ≈ LCLmed + r/f med(cw)
• LCLmed = Lower class limit of the median class
• r = the number of data items required to reach
N/2 from the median class.
• fmed = the median class frequency
• cw = the class interval (width)
Find the approximate median value – Example 2
Score Frequency
40 – 49 5
50 – 59 18
60 – 69 27
70 – 79 15
80 – 89 6
The Mode
• The mode of a set of data is the value that
occurs most frequenctly.
• For example; 26, 28, 28, 28, 30, 30, 32.
• The mode is 28.
• A data set can be bimodal (have two modes) or
multimodal (have more than two modes).
Mode for Grouped Data
• We assign mode for grouped data that has a
highest frequency class.
• We use the frequency of the modal class and the
two adjacent classes to the locate the mode in the
modal class.
• If the modal classes have equal frequency, we
assume the mode is at the middle of the modal
class.
• If one of the adjacent classes has a higher
frequency than the other, we assume the mode is
proportionately closer to that class.
Find the mode – Example 1
• The modal class:
– The highest frequency = 142
– The class with the highest frequency is the modal class.
– So, the 2nd is the modal class.
– We expect the mode value is proptionately closer to the 3rd
class.
Hourly wage rate No. of workers (f)
3.00 – 3.49 68
3.50 – 3.99 142
4.00 – 4.49 100
4.50 – 4.99 60
5.00 – 5.49 40
5.50 – 5.99 20
6.00 – 6.49 10
• The mode = LCLmo + d1/d1+d2 (cw)
• LCLmo = Lower class limit of the modal
class.
• d1 = modal class frequency – the frequency
in the previous class; (142 – 68) = 74
• d2 = modal class frequency – the frequency
in the next class; (142 – 100) = 42
• cw = the class interval (class width); 0.5
• The mode = 3.50 + 74/74+42 (0.5)
• The mode = 3.50 + 0.32 = 3.82
Find the modal age for the age distribtution of 228
patients – Example 2
Class Frequency
15 – 19 6
20 – 24 19
25 – 29 50
30 – 34 57
35 – 39 48
40 – 44 27
45 – 49 21
Total 228
Compute the Mean, the Median and the Mode
Values Frequency
141 – 150 17
151 – 160 29
161 - 170 42
171 – 180 72
181 – 190 84
191 – 200 107
201 – 210 49
211 – 220 34
221 – 230 31
231 – 240 16
241 - 250 12
Skewness
Remarks:
5710 1
Pth percentile, let us compute the
5755 2
80th percentile for the salary 5850 3
5890 6
Pth percentile = P/100(N+1) 5920 7
5950 9
P = 80 and N = 12, the
6050 10
location of the 80th percentile is 6130 11
P80 = 80/100 (12+1) = 10.4 6325 12
• The interpretation of P80 = 10.4 is that the 80th
percentile is 40% of the way between
– the value in position 10 and the value in position
11.
• In other words, the 80th percentile is
– the value in position 10 (6050) plus .4 times the
difference between the value in position 11 (6130)
and the value in position 10 (6050).
• Thus, the 80th percentile is:
• 80th percentile = 6050 + .4(6130 – 6050) =
6050 + .4(80) = 6082
• Let us now compute the 50th percentile for the
starting salary data. With P = 50 and N = 12, the
location of the 50th percentile is
• P50 = 50/100(12+1) = 6.5
• With P50 = 6.5, we see that the 50th percentile is
• 50% of the way between the value in position 6
(5890) and the value in position 7 (5920).
• Thus, the 50th percentile is
• = 5890 + .5(5920 – 5890) = 5890 + .5(30) = 5905
• Note that the 50th percentile is also the median.
• Quartiles:
• Quartiles are descriptive measures that
separate large data sets into four quarters.
• Q1 = first quartile, or 25th percentile
• Q2 = second quartile, or 50th percentile
(also the median)
• Q3 = third quartile, or 75th percentile
Calculating Quartile
• For Q1, P25 = 25/100 (12+1) = 3.25
• The first quartile, or 25th percentile, is .25 of the way
between the value in position 3 (5850) and the value in
position 4 (5880).
• Thus, Q1 = 5850 + .25(5880 – 5850) = 5850 + .25(30) =
5857.5
• For Q3, P75 = 75/100 (12+1) = 9.75
• The third quartile, or 75th percentile, is .75 of the way
between the value in position 9 (5950) and the value in
position 10 (6050).
• Thus, Q3 = 5950 + .75(6050 – 5950) = 5950 + .75(100) =
6025
• Find Q2
Measures of Dispersion/Variation
• The scatter or spread of items of a distribution
is known as dispersion or variation.
• In other words the degree to which numerical
data tend to spread about an average value is
called dispersion or variation of the data.
• Measures of dispersion are statistical measures
which provide ways of measuring the extent in
which data are dispersed or spread out.
Example:
• Consider the following two sets of scores:
– Set 1: 40, 50, 60, 60, 40, 50; µ = 50
– Set 2: 0,100, 25, 75, 80, 20; µ = 50
– The same mean, but different dispersion.
Why study dispersion/variation?
Values (Xi) 3 4 0 8 2 9 6 2 11
Deviations /3-5/ /4-5/ /0-5/ /8-5/ /2-5/ /9-5/ /6-5/ /2-5/ /11-5/
from mean = 2 =1 =5 =3 =3 =4 =1 =3 =6
under
absolute value
/Xi- µ/
4.Variance and Standard Deviation
• The variance and standard deviation are the
most superior and widely used measures of
dispersion.
• Variance = 𝜎2 = S(x - µ)2/N
• Standard deviation = 𝜎 = 𝜎 2
Example: 24, 25, 29,29,30,31 Find variance and standard deviation.
𝜎2 = S(x - µ)2/N = 40/6 = 6.67
𝜎= 𝜎 = 2 6.67 = 2.58
Frequ 7 10 22 15 12 6 3
ency
3. Compute the mean deviation of the following
data.
Price/kg Units ordered
20 – 29 2
30 – 39 12
40 – 49 15
50 – 59 20
60 – 69 18
70 – 79 10
80 – 89 9
90 – 99 4
4. Two distributions A and B have mean 80 and 20
inches and standard deviations 10 and 15 inches
respectively. Which distribution has more unlike
elements?
5. Suppose that the following data show the ratings of
hard-shell jackets based on the breathability, durability,
versatility, features, mobility, and weight of each jacket.
The ratings range from 0 (lowest) to 100 (highest).
42, 66, 67, 71, 78, 62, 61, 76, 71, 67,
61, 64, 61, 54, 83, 63, 68, 69, 81. 53
a. Compute the mean, median, and mode.
b. Compute the first and third quartiles.
c. Compute and interpret the 90th percentile.
Correction for the first exercise (Q. 1)
• Please replace the table in Question 1 with this
table.
Class intervals Frequency
9.6 – 14.5 10
14.6 – 19.5 20
19.6 – 24.5 30
24.6 – 29.5 25
29.6 – 34.5 15