Lesson 4
Lesson 4
Introduction to Statistics
STAT 1101
In statistics, to describe the data set accurately, statisticians must know more than
the measures of central tendency.
Example 3-15
Solution
A testing lab wishes to test two experimental
brands of outdoor paint to see how long The mean for brand A is
each will last before fading. The testing lab ∑𝑿 210
makes 6 gallons of each paint to test. Since 𝝁= 𝑵
= 𝟔
=35 months
different chemical agents are added to each
group and only six cans are involved, these The mean for brand B is
two groups constitute two small populations.
The results (in months) are shown in the ∑𝑿 210
𝝁= 𝑵
= 𝟔 =35 months
table to the right.
Find the mean of each group.
Chapter 4: Data Description 4
Introduction
Even though the means are the same for both brands, the spread, or variation, is
quite different. that brand B performs more consistently; it is less variable.
for the spread or variability of a data set, three measures are commonly used:
Range
Standard
Variance
deviation
Definition
The range is the highest value minus the lowest value. The symbol R
is used for the range.
R = highest value − lowest value
Example 3-16
Solution
Find the Range for the paints in example 3-15. The range for brand A is
𝑹 = 𝟔𝟎 − 𝟏𝟎 = 𝟓𝟎 months
The range for brand B is
𝑹 = 𝟒𝟓 − 𝟐𝟓 = 𝟐𝟎 months
Data variation
❑It is based on the difference or distance each data value is from the mean. This
difference or distance is called a deviation.
❑The sum of the deviations for all data values about the mean (without rounding),
this sum will always be zero. That is, Σ (X − μ) = 0.
Definition
The population variance is the average of the squares of the distance
each value is from the mean. The symbol for the population variance is
σ2 (σ is the Greek lowercase letter sigma).
The formula for the population variance is
𝟐
Σ X − μ
σ𝟐 =
𝑁
Where
X = individual value
μ = population mean
N = population size
Chapter 4: Data Description 8
Population Variance and Standard Deviation
Definition
The population standard deviation is the square root of the variance.
The symbol for the population standard deviation is σ.
The corresponding formula for the population standard deviation is
Σ X − μ 𝟐
σ= σ𝟐 =
𝑁
Example 3-18
Find the variance and standard deviation for the brand A paint in example 3-15.
10, 60, 50, 30, 40, 20
Chapter 4: Data Description 9
Population Variance and Standard Deviation
Solution
ΣX 10 + 60 + 50 + 30 + 40 + 20 210
Step 1 Find the mean for the data. μ= = = = 35
𝑵 𝟔 𝟔
Step 2 Subtract the mean from each values (X − μ)
𝟏𝟎 − 𝟑𝟓 = −𝟐𝟓 𝟔𝟎 − 𝟑𝟓 = 𝟐𝟓 𝟓𝟎 − 𝟑𝟓 = 𝟏𝟓
𝟑𝟎 − 𝟑𝟓 = −𝟓 𝟒𝟎 − 𝟑𝟓 = 𝟓 𝟐𝟎 − 𝟑𝟓 = −𝟏𝟓
𝟐 𝟐 𝟐 𝟐 𝟐 𝟐 𝟐
Step 3 Square each result X −μ −𝟐𝟓 𝟐𝟓 𝟏𝟓 −𝟓 𝟓 −𝟏𝟓
𝟐
Step 4 Find the sum of the Square each result ∑ X −μ 625 + 625 + 225 + 25 + 25 + 225 = 1750
𝟐
∑ X −μ
Step 5 Divide the sum by N to get the variance
𝑵
𝟏𝟕𝟓𝟎
σ𝟐 = = 𝟐𝟗𝟏. 𝟕
𝟔
Step 6 Take the square root of the variance to get the standard deviation. 𝝈 = 𝟐𝟗𝟏. 𝟕
Chapter 4: Data Description 11
Population Variance and Standard Deviation
Solution
It is helpful to make a table.
Example 3-19
Find the variance and standard deviation for the brand B paint in example 3-15.
Solution
Step 4 Find the sum of the Square in column C
𝟐
∑ X −μ = 𝟎 + 𝟏𝟎𝟎 + 𝟐𝟓 + 𝟎 + 𝟐𝟓 + 𝟏𝟎𝟎 = 𝟐𝟓𝟎
𝟐
𝟐 ∑ X −μ 𝟐𝟓𝟎
σ = = = 𝟒𝟏. 𝟕
𝑵 𝟔
Step 6 Take the square root of the variance to get the standard deviation.
𝟐
∑ X −μ
𝝈= = 𝟒𝟏. 𝟕 = 𝟔. 𝟓
𝑵
Chapter 3: Data Description 14
Sample Variance and Standard Deviation
Definition
The formula for the sample variance (denoted by 𝒔𝟐 ) is
Σ X − ഥ 𝟐
𝑿
𝒔𝟐 =
𝒏−𝟏
The formula for the sample standard deviation (denoted by 𝒔) is
ഥ
Σ X −𝑿 𝟐
𝒔=
𝒏−𝟏
Where
X = individual value
ഥ = sample mean
𝑿
n = sample size
Chapter 4: Data Description 15
Sample Variance and Standard Deviation
Example 3-20
The number of public school teacher strikes in Pennsylvania for a random sample of school years is
shown. Find the sample variance and the sample standard deviation.
Solution
Step 4 Find the sum of the Square
ഥ
∑ X −𝑿 𝟐
= 𝟎. 𝟐𝟓 + 𝟐. +𝟑𝟎. 𝟐𝟓 + 𝟐. 𝟐𝟓 + 𝟎. 𝟐𝟓 + 𝟑𝟎. 𝟐𝟓 = 𝟔𝟓. 𝟓
Step 6 Take the square root of the variance to get the sample standard deviation.
ഥ 𝟐
∑ X −𝑿
𝒔= = 𝟏𝟑. 𝟏 = 𝟑. 𝟔
𝒏−𝟏
Chapter 3: Data Description 17
Sample Variance and Standard Deviation
Example 3-21
The number of public school teacher strikes in Pennsylvania for a random sample of school years is
shown. Find the sample variance and the sample standard deviation.
9, 10, 14, 7, 8, 3
Chapter 3: Data Description 18
Sample Variance and Standard Deviation
Solution
Step 1 Find the sum of the values ΣX = 9 + 10 + 14 + 7 + 8 + 3= 51
𝒔 = 𝟏𝟑. 𝟏 = 𝟑. 𝟔
Step 1 Make a table as shown and find the midpoints of each class and place them in column C.
.
Step 2 Multiply the frequency by the midpoint for each class, and place the product in column D.
Step 3 Multiply the frequency by the square of the midpoint, and place the products in column E.
Step 4 Find the sums of columns B, D, and E. (The sum of column B is 𝒏. The sum of column D is
𝚺𝒇. 𝑿𝒎 . The sum of column E is 𝚺𝒇. 𝑿𝟐𝒎 .)
Step 5 Substitute in the formula and solve to get the variance.
𝐧 ∑𝒇. 𝑿 𝟐 − 𝚺𝒇. 𝑿 𝟐
𝒎 𝒎
𝒔𝟐 =
𝒏 𝒏−𝟏
Step 6 Take the square root to get the standard deviation
Example 3-22
Find the sample variance and the sample standard deviation for the frequency
distribution of the data shown. The data represent the number of miles that 20
runners ran during one week.
Solution
Step 1 Make a table as shown, and find the midpoint of
each class
Step 2 Multiply the frequency by the
midpoint for each class, and place the product
in column D.
Step 3 Multiply the frequency by the square
of the midpoint, and place the products in
column E.
Solution
Step 5 Substitute in the formula and solve to get the variance.
𝟐 𝟐
𝟐
𝐧 ∑𝒇. 𝑿𝒎 − 𝚺𝒇. 𝑿𝒎
𝒔 =
𝒏 𝒏−𝟏
𝟐
𝟐𝟎 𝟏𝟑𝟑𝟏𝟎 − 𝟒𝟗𝟎 𝟐𝟔𝟏𝟎𝟎
= = = 𝟔𝟖. 𝟕
𝟐𝟎 𝟐𝟎 − 𝟏 𝟑𝟖𝟎
Step 6 Take the square root to get the standard deviation
𝒔 = 𝟔𝟖. 𝟕 = 𝟖. 𝟑
✓The variances and standard deviations can be used to determine the spread of
the data. If the variance or standard deviation is large, the data are more
dispersed. This information is useful in comparing two (or more) data sets to
determine which is more (most) variable.
✓The measures of variance and standard deviation are used to determine the
consistency of a variable. For example, in the manufacture of fittings, such as
nuts and bolts, the variation in the diameters must be small, or else the parts will
not fit together.
✓The variance and standard deviation are used to determine the number of data
values that fall within a specified interval in a distribution.
❑ Whenever two samples have the same units of measure, the variance and standard deviation
for each can be compared directly.
❑ A statistic that allows you to compare standard deviations when the units are different, is
called the coefficient of variation.
Definition
The coefficient of variation, denoted by CVar, is the standard deviation
divided by the mean. The result is expressed as a percentage.
For samples, For populations,
𝑺 𝝈
𝐂. 𝐕𝐚𝐫 = ഥ × 𝟏𝟎𝟎 𝐂. 𝐕𝐚𝐫 = × 𝟏𝟎𝟎
𝑿 𝝁
Chapter 4: Data Description 25
Coefficient of Variation
Example 3-23
The mean of the number of sales of cars over a 3-month period is 87, and the
standard deviation is 5. The mean of the commissions is $5225, and the standard
deviation is $773. Compare the variations of the two.
Solution
𝑺 𝟓
The coefficients of variation are 𝐂. 𝐕𝐚𝐫 = ഥ × 𝟏𝟎𝟎 = × 𝟏𝟎𝟎 = 𝟓. 𝟕%
𝑿 𝟖𝟕
𝑺 𝟕𝟕𝟑
𝐂. 𝐕𝐚𝐫 = × 𝟏𝟎𝟎 = × 𝟏𝟎𝟎 = 𝟏𝟒. 𝟖%
ഥ
𝑿 𝟓𝟐𝟐𝟓
Since the coefficient of variation is larger for commissions, the commissions are
more variable than the sales.
Chapter 4: Data Description 26
Measures of Position
Measures of
Position
Standard
Percentiles Quartiles.
scores
Example 3-32
The number of traffic violations recorded by a police department for a 10-day period
is shown. Find the data value corresponding to the 65th percentile.
22 19 25 24 18 15 9 12 16 20
Solution
Solution
Step 3 Since c is not a whole number, round it up to the next whole number; in this
case, it is c = 7.
Start at the lowest value and count over to the 7th value, which is 20.
9 12 15 16 18 19 20 22 24 25
7th value
Example 3-33
The number of traffic violations recorded by a police department for a 10-day period
is shown. Find the data value corresponding to the 30th percentile.
22 19 25 24 18 15 9 12 16 20
Solution
Solution
Step 3 Since c is a whole number, use the value halfway between the c and c + 1
values when counting up from the lowest.
In this case, it is the third and fourth values.
9 12 15 16 18 19 20 22 24 25
Definition
Quartiles divide the distribution into four equal groups, denoted by
𝑸𝟏 , 𝑸𝟐 , 𝑸𝟑 .
✓Note that 𝑸𝟏 is the same as the 25th percentile; 𝑸𝟐 is the same as the 50th
percentile, or the median; 𝑸𝟑 corresponds to the 75th percentile.
Step 2 Find the median of the data values. This is the value for 𝑸𝟐 .
Step 3 Find the median of the data values that fall below 𝑸𝟐 .
Step 4 Find the median of the data values that fall above 𝑸𝟐 .
Example 3-34
The number of traffic violations recorded by a police department for a 10-day period
is shown. Find 𝑸𝟏 , 𝑸𝟐 , and 𝑸𝟑 .
22 19 25 24 18 15 9 12 16 20
Solution
Step 1 Arrange the data in order from lowest to highest.
9 12 15 16 18 19 20 22 24 25
Step 2 Find the median 𝑸𝟐 .
9 12 15 16 18 19 20 22 24 25
𝟏𝟖+𝟏𝟗
𝑴𝑫 = =18.5
𝟐
Chapter 4: Data Description 36
Quartiles
Solution
Step 3 Find the median of the data values below 18.5.
9 12 15 16 18
𝑸𝟏 =15
Step 4 Find the median of the data values greater than 18.5.
19 20 22 24 25
𝑸𝟑 =22
Definition
The interquartile range (IQR) is the difference between the third
and first quartiles.
𝐈𝐐𝐑 = 𝑸𝟑 − 𝑸𝟏
Example 3-35
Definition
A boxplot is a graph of a data set obtained by
drawing a horizontal line from the minimum data
value to Q1, drawing a horizontal line from Q3 to the
maximum data value, and drawing a box whose
vertical sides pass through Q1 and Q3 with a vertical
line inside the box passing through the median or Q2.
Constructing a Boxplot
Step 1 Find the five-number summary for the data.
Step 2 Draw a horizontal axis and place the scale on the axis. The scale should start on or below
the minimum data value and end on or above the maximum data value.
Step 3 Locate the lowest data value, Q 1, the median, Q3, and the highest data value; then draw a
box whose vertical sides go through Q1 and Q3.
Finally, draw a line from the minimum data value to the left side of the box, and draw a line
from the maximum data value to the right side of the box.
Chapter 4: Data Description 41
The Boxplots
Example 3-37
The number of meteorites found in 10 states of the United States is:
89, 47, 164, 296, 30, 215, 138, 78, 48, 39.
Construct a boxplot for the data.
Solution
Find the median 30, 39, 47, 48, 78, 89, 138, 164, 215, 296
median=83.5
𝑸𝟏 =47
𝑸𝟑 =164
Step 3 Draw the box above the scale using Q1 and Q3 . Draw a vertical line through the median,
and draw lines from the lowest data value to the box and from the highest data value to
the box.
Example XL3-2
The number of public school teacher strikes in Pennsylvania for a random sample of school years is
shown. Find the sample variance and the sample standard deviation using Excel.
Excel has two built-in functions to find the Percentile Rank corresponding to a value
in a set of data.
1. PERCENTRANK.INC calculates the Percentile Rank corresponding to a data value in the range
0 to 1 inclusively.
2. PERCENTRANK.EXC calculates the Percentile Rank corresponding to a data value in the range
0 to 1 exclusively.
Example XL3-4
Given the following dataset. Find the data value corresponding to the 30th percentile. (Page 164)
5 6 12 13 15 18 22 50
Chapter 4: Data Description 49
Percentiles
Solution
1. On an Excel worksheet enter the data in cells A2–A9. Enter a label for the variable in cell A1.
2. Label cell B1 as Percent Rank INC and cell C1 as Percent Rank EXC.
3. Select cell B2.
4. Select the Formulas tab from the toolbar and Insert Function
5. Select the Statistical category for statistical functions and scroll in the function list to PERCENTRANK.INC
(PERCENTRANK.EXC) and click [OK].
In the PERCENTRANK.INC (PERCENTRANK.EXC) dialog boxes:
6. Type A2:A9 for the Array.
7. Type A2 for X, then click [OK].
8. Repeat the procedure above for each data value in the set.
Chapter 4: Data Description 50
Percentiles
The function results for both PERCENTRANK.INC and PERCENTRANK.EXC are shown
below.
Note: Both functions return the Percentile Ranks as a number between 0 and 1.
You may convert these to numbers between 0 and 100 by multiplying each function
value by 100.
Example XL3-6
Given the following data set
33, 38, 43, 30, 29, 40, 51, 27, 42, 23, 31.
Construct a boxplot for the data using Excel.
Solution
Solution
1. On an Excel worksheet enter the data in cells A1–A11.
2. Select the Add-Ins tab, then MegaStat from the toolbar.
2. Select Descriptive Statistics from the MegaStat menu.
3. Enter the cell range A1:A11 in the Input range.
4. Check Boxplot Plot. Click [OK].