0% found this document useful (0 votes)
34 views154 pages

2 - Descriptive Statistics v3

Chapter 2 covers descriptive statistics, focusing on frequency distributions and their graphs, including construction and interpretation of various data displays such as histograms, stem-and-leaf plots, and pie charts. It explains how to calculate measures of central tendency, variation, and position, as well as methods for visualizing data through different types of graphs. The chapter emphasizes the importance of organizing and presenting data effectively to identify patterns and insights.

Uploaded by

PATEL Daksh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views154 pages

2 - Descriptive Statistics v3

Chapter 2 covers descriptive statistics, focusing on frequency distributions and their graphs, including construction and interpretation of various data displays such as histograms, stem-and-leaf plots, and pie charts. It explains how to calculate measures of central tendency, variation, and position, as well as methods for visualizing data through different types of graphs. The chapter emphasizes the importance of organizing and presenting data effectively to identify patterns and insights.

Uploaded by

PATEL Daksh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 154

Chapt 2

er
Descriptive
Statistics

Copyright 2019, Pearson Education, 1


Chapter Outline
• 2.1 Frequency Distributions and Their Graphs

• 2.2 More Graphs and Displays

• 2.3 Measures of Central Tendency

• 2.4 Measures of Variation

• 2.5 Measures of Position

Copyright 2019, Pearson Education, 2


Section 2.1

Frequency Distributions
and Their Graphs

Copyright 2019, Pearson Education, 3


Frequency Distribution
Frequency Distribution Class Frequency, f
• A table that shows 1–5 5
classes or intervals of
6 – 10 8
data with a count of the
11 – 15 6
number of entries in each
class. 16 – 20 8
• The frequency, f, of a 21 – 25 5
class is the number of 26 – 30 4
data entries in the class.

Copyright 2019, Pearson Education, 4


Frequency Distribution
• Each class has a lower class
Class Frequency, f
limit, which is the least
number that can belong 1–5 5
to the class, and an 6 – 10 8
• upper class limit, which is the 11 – 15 6
greatest number that can 16 – 20 8
belong to the class. 21 – 25 5
26 – 30 4
Lower class Upper class
limits limits

Copyright 2019, Pearson Education, 5


Frequency Distribution
• The class width is the
Class Frequency, f
distance between lower Class width
(or upper) limits of 1 –5 5
6–1=5
consecutive classes. 6 – 10 8
• The difference between the 11 – 15 6
maximum (30) and minimum 16 – 20 8
data entries (1) is called the 21 – 25 5
range. 26 – 30 4

Copyright 2019, Pearson Education, 6


Constructing a Frequency
Distribution
1. Decide on the number of classes.
 Usually between 5 and 20; otherwise, it may be
difficult to detect any patterns.
2. Find the class width.
 Determine the range of the data.
 Divide the range by the number of classes.
 Round up to the next convenient number.

Copyright 2019, Pearson Education, 7


Constructing a Frequency
Distribution
3. Find the class limits.
 You can use the minimum data entry as the
lower limit of the first class.
 Find the remaining lower limits (add the class
width to the lower limit of the preceding class).
 Find the upper limit of the first class. Remember
that classes cannot overlap.
 Find the remaining upper class limits.

Copyright 2019, Pearson Education, 8


Example: Constructing a
Frequency Distribution
The data set lists the out-of-pocket prescription
medicine expenses (in dollars) for 30 U.S. adults in a
recent year. Construct a frequency distribution that has
seven classes. (Adapted from: Health, United States,
2015)
200 239 155 252 384 165 296 405 303 400
307 241 256 315 330 317 352 266 276 345
238 306 290 271 345 312 293 195 168 342

Copyright 2019, Pearson Education, 9


Solution: Constructing a
Frequency Distribution
200 239 155 252 384 165 296 405 303 400
307 241 256 315 330 317 352 266 276 345
238 306 290 271 345 312 293 195 168 342

Number of classes: 7 (given)

Class width: Maximum data value (405)


- Minimum data value (155)
= 250 / 7 classes = 35.71 (width)
Round up to 36
Copyright 2019, Pearson Education, 10
Solution: Constructing a
Frequency Distribution
3. Use 155 (minimum Class
width = 36
value) as first lower
limit. Add the class
width of 36 to get the + 36 =
lower limit of the next
class.
155 + 36 = 191
Find the remaining
lower limits.

Copyright 2019, Pearson Education, 11


Solution: Constructing a
Frequency Distribution
The upper limit of the
first class is 190 (one less
than the lower limit of
the
second class). Class
Add the class width of 36 width = 36
to get the upper limit of
the next class.
190 + 36 = 226
Find the remaining upper
limits.
Copyright 2019, Pearson Education, 12
Solution: Constructing a
Frequency Distribution
4. Make a tally mark for each data entry in the row of
the appropriate class.
5. Count the tally marks to find the total frequency f
for each class.

Copyright 2019, Pearson Education, 13


Breakout Session:
Frequency Distribution
Use the data set, which represents the overall average
class size for 20 national universities in the USA.

Copyright 2019, Pearson Education, 14


Determining the Midpoint

(Lower class limit)  (Upper class


limit) 2

Copyright 2019, Pearson Education, 16


Determining the Relative
Frequency
Relative Frequency of a class
• Portion or percentage of the data that falls in a
particular class.
• relative frequency = class frequency  f
Sample size
n

Copyright 2019, Pearson Education, 17


Determining the Cumulative
Frequency
Cumulative frequency of a class
• The sum of the frequency for that class and all
previous classes.
• The cumulative frequency of the last class is equal to
the sample size n.

Copyright 2019, Pearson Education, 18


Solution: Finding Midpoints, Relative
and Cumulative Frequencies
• The midpoints, relative frequencies, and cumulative
frequencies of the first five classes are calculated
as follows:

Copyright 2019, Pearson Education, 19


Solution: Finding Midpoints, Relative
and Cumulative Frequencies
• The remaining midpoints, relative frequencies, and
cumulative frequencies are shown in the expanded
frequency distribution below.

Copyright 2019, Pearson Education, 20


Solution: Finding Midpoints, Relative
and Cumulative Frequencies

• There are several patterns in the data set:


• For instance, the most common range for the
expenses is $299 to $334.
• Also, about half of the expenses are less
than $299.

Copyright 2019, Pearson Education, 21


Graphs of Frequency
Distributions
Frequency Histogram
• A bar graph that represents the frequency distribution.
• The horizontal scale is quantitative and measures the
data values.
• The vertical scale measures the frequencies of the
classes.
• Consecutive bars must touch.

frequency
data values

Copyright 2019, Pearson Education, 22


Class Boundaries
Class boundaries
• Because consecutive bars of a histogram must touch,
bars must begin and end at class boundaries instead
of class limits.
• The numbers that separate classes without forming
gaps between them.

Copyright 2019, Pearson Education, 23


Example: Constructing a
Frequency Histogram
Draw a frequency histogram for the frequency
distribution in the previous example. Describe any
patterns.

Copyright 2019, Pearson Education, 24


Solution: Constructing a
Frequency Histogram
• First, find the
class boundaries
• The distance from the upper
limit of the first class to the
lower limit of the second
class is 191– 190 = 1.
• Half this distance is 0.5.
• First class lower boundary
= 155 – 0.5 = 154.5
• First class upper boundary
Copyright 2019, Pearson Education, 25
Solution: Constructing a
Frequency Histogram
You can mark the horizontal scale either at the midpoints or
at the class boundaries. Both histograms are shown below.

Copyright 2019, Pearson Education, 26


Solution: Constructing a
Frequency Histogram

You can see that two-


thirds of the adults are
paying more than
$262.50 for out-of-
pocket
prescription
medicine expenses.

Copyright 2019, Pearson Education, 27


Graphs of Frequency
Distributions
Frequency Polygon
• A line graph that emphasizes the continuous change
in frequencies.
frequency

data values

Copyright 2019, Pearson Education, 28


Example: Constructing a
Frequency Polygon
Draw a frequency polygon for the frequency
distribution in previous example. Class width = 36

Copyright 2019, Pearson Education, 29


Solution: Constructing a
Frequency Polygon
To construct the frequency polygon, use the same
horizontal and vertical scales that were used in the
histogram labeled with class midpoints.
The graph should begin
and end on the horizontal
axis, so extend the left
side to one class width
before the first class
midpoint and extend the
right side to one class
width after the last class
midpoint.

 Take 172.5 - 36 = 136.5


and 388.5 + 36 = 424.5
Copyright 2019, Pearson Education, 30
Breakout Session 2: Polygon

Construct a Polygon for the frequency distribution


from Breakout Session 1

Copyright 2019, Pearson Education, 31


Graphs of Frequency
Distributions
Relative Frequency Histogram
• Has the same shape and the same horizontal scale as
the corresponding frequency histogram.
• The vertical scale measures the relative frequencies,
not frequencies.

frequency
relative data values

Copyright 2019, Pearson Education, 34


Example: Constructing a
Relative Frequency Histogram
Construct a relative frequency histogram for the second
example.

Copyright 2019, Pearson Education, 35


Solution: Constructing a
Relative Frequency Histogram

From this graph, you can quickly see that 0.2, or 20%, of
the adults have expenses between $262.50 and $298.50.

Copyright 2019, Pearson Education, 36


Graphs of Frequency
Distributions
Cumulative Frequency Graph or Ogive
• A line graph that displays the cumulative frequency
of each class at its upper class boundary.
• The upper boundaries are marked on the horizontal
axis.
• The cumulative frequencies are marked on the
vertical axis.

cumulative
frequency
data values
Copyright 2019, Pearson Education, 37
Example: Constructing an Ogive
Construct an ogive for the second example frequency
distribution.

Copyright 2019, Pearson Education, 38


Solution: Constructing an Ogive

From the ogive, you can see that 10 adults had expenses of
$262.50 or less. Also, the greatest increase in cumulative
frequency occurs between $298.50 and $334.50.
Copyright 2019, Pearson Education, 39
Breakout Session 3: Ogive

Include cumulative frequencies in your frequency


distribution and construct the ogive for the frequency
distribution

Copyright 2019, Pearson Education, 40


Chapt 2
er
Descriptive
Statistics

Copyright 2019, Pearson Education, 41


Chapter Outline
• 2.1 Frequency Distributions and Their Graphs

• 2.2 More Graphs and Displays

• 2.3 Measures of Central Tendency

• 2.4 Measures of Variation

• 2.5 Measures of Position

Copyright 2019, Pearson Education, 42


Section 2.2

More Graphs and Displays

Copyright 2019, Pearson Education, 43


Graphing Quantitative Data
Sets
Stem-and-leaf plot
• Each number is separated into a stem and a leaf.
• Similar to a histogram.
• Still contains original data values. 26
• Provides an easy way to sort data.
2 1 5 5 6 7 8

Data: 21, 25, 25, 26, 27, 28, 3 0 6 6


30, 36, 36, 45
4 5

. Copyright 2019, Pearson Education, 44


Example: Constructing a
Stem-and-Leaf Plot
The data set lists the numbers of text messages sent in
one day by 50 cell phone users. Display the data in a
stem-and-leaf plot. Describe any patterns. (Adapted
from Pew Research)

Copyright 2019, Pearson Education, 45


Solution: Constructing a
Stem-and-Leaf Plot
• The data entries go from a
low of 16 to a high of 149.
• Use the rightmost digit as the
leaf.
 For instance,
76 = 7|6 and
149 = 14 | 9
• List the stems, 7 to 14, to the left
of a vertical line.
• For each data entry, list a leaf to
the right of its stem.
Copyright 2019, Pearson Education, 46
Solution: Constructing a
Stem-and-Leaf Plot

From the display, you can see that more than 50%
of the cell phone users sent between 20 and 50 text
messages.

Copyright 2019, Pearson Education, 47


Graphing Quantitative Data
Sets
Dot plot
• Each data entry is plotted, using a point, above a
horizontal axis

Data: 21, 25, 25, 26, 27, 28, 30, 36, 36, 45
26

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
45

. Copyright 2019, Pearson Education, 48


Example: Constructing a Dot Plot
Use a dot plot to organize the data set in Example 1.
Describe any patterns.

Copyright 2019, Pearson Education, 49


Solution: Constructing a Dot Plot

From the dot plot, you can see that most entries occur between 20
and 80 and only 4 people sent more than 100 text messages. You
can also see that 149 is an unusual data entry.
. Copyright 2019, Pearson Education, 50
Solution: Constructing a Dot Plot
Technology can be used to construct dot plots. For
instance, Minitab and StatCrunch dot plots for the text
messaging data are shown below.

. Copyright 2019, Pearson Education, 51


Graphing Qualitative Data
Sets
Pie Chart
• Pie charts provide a convenient way to present
qualitative data graphically as percents of a whole.
• A circle is divided into sectors that represent categories.
• The area of each sector is proportional to the frequency
of each category.

Copyright 2019, Pearson Education, 52


Example: Constructing a Pie Chart
The numbers of earned degrees conferred (in thousands)
in 2014 are shown in the table. Use a pie chart to
organize the data. (Source: U.S. National Center for
Educational Statistics)

. Copyright 2019, Pearson Education, 53


13.10.2022

Solution: Constructing a Pie Chart


• Find the relative frequency (percent) of each category.

1 360

. Copyright 2019, Pearson Education, 54


Solution: Constructing a Pie Chart

Check using technology

From the pie chart, you can see that almost one-half of
the degrees conferred in 2014 were bachelor’s degrees.

. Copyright 2019, Pearson Education, 55


Graphing Paired Data Sets
Paired Data Sets
• Each entry in one data set corresponds to one entry in
a second data set.
• Graph using a scatter plot.
 The ordered pairs are graphed as y

points in a coordinate plane.


 Used to show the relationship
between two quantitative variables.
x

. Copyright 2019, Pearson Education, 56


Example: Interpreting a
Scatter Plot
The British statistician Ronald Fisher introduced a
famous data set called Fisher's Iris data set. This data
set describes various physical characteristics, such as
petal length and petal width (in millimeters), for three
species of iris. The petal lengths form the first data set
and the petal widths form the second data set. (Source:
Fisher, R. A., 1936)

. Copyright 2019, Pearson Education, 57


Example: Interpreting a
Scatter Plot
As the petal length increases, what tends to happen to
the petal width?

Each point in the


scatter plot
represents the
petal length and
petal width of one
flower.

. Copyright 2019, Pearson Education, 58


28.10.2021

Solution: Interpreting a
Scatter Plot

From the scatter plot, you can see that as


the petal length increases, the petal width
also tends to increase.

. Copyright 2019, Pearson Education, 59


Chapt 2
er
Descriptive
Statistics

Copyright 2019, Pearson Education, 60


Chapter Outline
• 2.1 Frequency Distributions and Their Graphs

• 2.2 More Graphs and Displays

• 2.3 Measures of Central Tendency

• 2.4 Measures of Variation

• 2.5 Measures of Position

Copyright 2019, Pearson Education, 61


Section 2.3

Measures of Central Tendency

. Copyright 2019, Pearson Education, 62


Section 2.3 Objectives
• How to find the mean, median, and mode of a
population and of a sample
• How to find the weighted mean of a data set, and how
to estimate the sample mean of grouped data
• How to describe the shape of a distribution as
symmetric, uniform, or skewed and how to compare
the mean and median for each

. Copyright 2019, Pearson Education, 63


Measures of Central
Tendency
Measure of central tendency
• A value that represents a typical, or central, entry of a
data set.
• Most common measures of central tendency:
 Mean
 Median
 Mode

. Copyright 2019, Pearson Education, 64


Measure of Central Tendency:
Mean
Mean (average)
• The sum of all the data entries divided by the number
of entries.
• Sigma notation: Σx = add all of the data entries (x)
in the data set.
• Population  N
x
mean:
• Sample mean: x
x n

. Copyright 2019, Pearson Education, 65


Example: Finding a Sample
Mean
The weights (in pounds) for a sample of adults before
starting a weight-loss study are listed. What is the mean
weight of the adults?
274 235 223 268 290 285
235

100 pound = 45,36 kg

. Copyright 2019, Pearson Education, 66


Solution: Finding a Sample
Mean
274 235 223 268 290 285 235
• The sum of the weights is

Σx = 274 + 235 + 223 + 268 + 290 + 285 + 235 = 1810

• To find the mean weight, divide the sum of the


weights by the number of adults in the sample (7).

The mean weight of the adults is about 258.6 pounds.


. Copyright 2019, Pearson Education, 67
Measure of Central Tendency:
Median
Median
• The value that lies in the middle of the data when the
data set is ordered.
• Measures the center of an ordered data set by dividing
it into two equal parts.
• If the data set has an
 odd number of entries: median is the middle data
entry.
 even number of entries: median is the
mean of the two middle data entries.
. Copyright 2019, Pearson Education, 68
Example: Finding the Median
Find the median of the weight listed in the first
example.
274 235 223 268 290 285
235

. Copyright 2019, Pearson Education, 69


Solution: Finding the Median

• First, order the data.


223 235 235 268 274 285 290

• There are seven entries (an odd number), the


median is the middle, or fourth, data entry.

The median weight of the adults is 268


pounds.
. Copyright 2019, Pearson Education, 70
Example: Finding the Median
In the previous example, the adult weighing 285 pounds
decides to not participate in the study. What is the
median weight of the remaining adults?
223 235 235 268 274
290

. Copyright 2019, Pearson Education, 71


Solution: Finding the Median

• First order the data.


223 235 235 268 274 290

• There are six entries (an even number), the median


is the mean of the two middle entries.

The median weight of the remaining adults is 251.5 pounds.

. Copyright 2019, Pearson Education, 72


Measure of Central Tendency:
Mode
Mode
• The data entry that occurs with the greatest frequency.
• If no entry is repeated the data set has no mode.
• If two entries occur with the same greatest frequency,
each entry is a mode (bimodal).

. Copyright 2019, Pearson Education, 73


Example: Finding the Mode
Find the mode of the weights listed in Example 1.
223 235 235 268 274
285 290

. Copyright 2019, Pearson Education, 74


Solution: Finding the Mode

• Ordering the data helps to find the mode.


223 235 235 268 274 285
290

• The entry of 235 occurs twice, whereas the other


data entries occur only once.

The mode of the weights is 235 pounds.


. Copyright 2019, Pearson Education, 75
Example: Finding the Mode
At a political debate a sample of audience members was
asked to name the political party to which they belong.
Their responses are shown in the table. What is the
mode of the responses?
Political Party Frequency, f
Democrat 46
Republican 34
Independent 39
Other/don’t know 5

. Copyright 2019, Pearson Education, 76


Solution: Finding the Mode
Political Party Frequency, f
Democrat 46
Republican 34
Independent 39
Other/don’t know 5

The response occurring with the greatest frequency is


Democrat. So, the mode is Democrat. In this sample, there were
more Democrats than people of any other single affiliation.

. Copyright 2019, Pearson Education, 77


Exercise 4:
Mean, Median and Mode

Calcluate the Mean, Median and Mode for the Data set

2 10 12 13 15
16 16 18 22 23

. Copyright 2019, Pearson Education, 78


Comparing the Mean, Median,
and Mode
• All three measures describe a typical entry of a data
set.
• Advantage of using the mean:
 The mean is a reliable measure because it takes
into account every entry of a data set.
• Disadvantage of using the mean:
 Greatly affected by outliers (a data entry that is far
removed from the other entries in the data set).

. Copyright 2019, Pearson Education, 80


Example: Comparing the
Mean, Median, and
Mode
The table shows the sample ages of students in a class.
Find the mean, median, and mode of the ages. Are there
any outliers? Which measure of central tendency best
describes a typical entry of this data set?
Ages in a class
20 20 20 20 20 20 21
21 21 21 22 22 22 23
23 23 23 24 24 65

. Copyright 2019, Pearson Education, 81


Solution: Comparing the
Mean, Median, and Mode
Ages in a class
20 20 20 20 20 20 21
21 21 21 22 22 22 23
23 23 23 24 24 65

Mean: x
x

20  20  ...  24  65
 23.8

Median: years
21  22
2n  21.5 years 20
Mode: 20 years (the entry occurring with the
greatest frequency)
. Copyright 2019, Pearson Education, 82
Solution: Comparing the
Mean, Median, and Mode
Mean ≈ 23.8 years Median = 21.5 years Mode = 20
years

• The mean takes every entry into account, but is


influenced by the outlier of 65.
• The median also takes every entry into account, and
it is not affected by the outlier.
• In this case the mode exists, but it doesn't appear to
represent a typical entry.

. Copyright 2019, Pearson Education, 83


Solution: Comparing the
Mean, Median, and Mode
Sometimes a graphical comparison can help you decide
which measure of central tendency best represents a
data set.

In this case, it appears that the median best describes the data
set.
. Copyright 2019, Pearson Education, 84
Weighted Mean
Weighted Mean
• The mean of a data set whose entries have varying
weights.
• The weighted mean is given by

xw where w is the weight of each entry x.


x  w

. Copyright 2019, Pearson Education, 85


Example: Finding a Weighted
Mean
Your grades from last semester are in the table. The
grading system assigns points as follows: A = 4, B = 3,
C = 2, D = 1, F = 0. Determine your grade point average
(weighted mean).

. Copyright 2019, Pearson Education, 86


Solution: Finding a
Weighted Mean

Last semester, your grade point average was 2.5.

. Copyright 2019, Pearson Education, 87


Mean of Grouped Data
Mean of a Frequency Distribution
• Approximated by

xf n
x n
f
where x and f are the midpoints and frequencies of a
class, respectively.

. Copyright 2019, Pearson Education, 88


Finding the Mean of a
Frequency Distribution
In Words In Symbols
1. Find the midpoint of each (Lower limit)+(Upper limit)
x 2
class.
2. Find the sum of the
products of the midpoints xf
and the frequencies.
3. Find the sum of the n
frequencies. f
4. Find the mean of the xf
frequency distribution. x n

. Copyright 2019, Pearson Education, 89


Example: Find the Mean of a
Frequency Distribution
The frequency distribution
shows the out-of-pocket
prescription medicine
expenses (in dollars) for 30
U.S. adults in a recent year.
Use the frequency distribution
to estimate the mean expense.
Using the sample mean
formula, the mean expense is
$285.50. Compare this with
the estimated mean.
. Copyright 2019, Pearson Education, 90
Solution: Find the Mean of a
Frequency Distribution

The mean expense is $287.70. This value is an estimate because it is


based on class midpoints instead of the original data set.

. Copyright 2019, Pearson Education, 91


The Shape of Distributions
Symmetric Distribution
• A vertical line can be drawn through the middle
of a graph of the distribution and the resulting
halves are approximately mirror images.

. Copyright 2019, Pearson Education, 92


The Shape of Distributions
Uniform Distribution (rectangular)
• All entries or classes in the distribution have equal
or approximately equal frequencies.
• Symmetric.

. Copyright 2019, Pearson Education, 93


The Shape of Distributions

Skewed Left Distribution (negatively skewed)


• The “tail” of the graph elongates more to the left.
• The mean is to the left of the median.

. Copyright 2019, Pearson Education, 94


The Shape of Distributions

Skewed Right Distribution (positively skewed)


• The “tail” of the graph elongates more to the right.
• The mean is to the right of the median.

. Copyright 2019, Pearson Education, 95


Chapt 2
er
Descriptive
Statistics

Copyright 2019, Pearson Education, 96


Chapter Outline
• 2.1 Frequency Distributions and Their Graphs

• 2.2 More Graphs and Displays

• 2.3 Measures of Central Tendency

• 2.4 Measures of Variation

• 2.5 Measures of Position

Copyright 2019, Pearson Education, 97


Section 2.4

Measures of Variation

. Copyright 2019, Pearson Education, 98


Range
Range
• The difference between the maximum and minimum
data entries in the set.
• The data must be quantitative.
• Range = (Max. data entry) – (Min. data entry)

. Copyright 2019, Pearson Education, 99


Example: Finding the Range
Two corporations each hired 10 graduates. The starting
salaries for each graduate are shown. Find the range
of the starting salaries for Corporation A.

. Copyright 2019, Pearson Education, 100


Solution: Finding the Range
• Ordering the data helps to find the least and greatest
salaries.
37 38 39 41 41 41 42 44 45 47
minimum
maximum
Range = (Max. salary) – (Min. salary)
= 47 – 37 = 10

The range of starting salaries for Corporation A is


10, or $10,000.
. Copyright 2019, Pearson Education, 101
Variatio
• Both data sets inn
the last example have a mean of
41.5, or $41,500, a median of 41, or $41,000, and a
mode of 41, or $41,000. And yet the two sets
differ significantly.
• The difference is that the entries in the second set
have greater variation. As you can see in the figures
on the next slide, the starting salaries for Corporation
B are more spread out than those for Corporation A.

. Copyright 2019, Pearson Education, 102


Variatio
n

. Copyright 2019, Pearson Education, 103


Deviation, Variance, and
Standard Deviation
Deviation
• The difference between the data entry, x, and the
mean of the data set.
• Population data set:
 Deviation of x = x – μ
• Sample data set:
 Deviation of x = x – x

. Copyright 2019, Pearson Education, 104


Deviation, Variance, and
Standard Deviation
Population Variance

2

(x   ) 2

(x   ) 2
  2  N

 N
Population Standard
Deviation
. Copyright 2019, Pearson Education, 105
Example: Finding Population
Variance and Standard Deviation

. Copyright 2019, Pearson Education, 106


Solution: Finding Population
Standard Deviation
Salary ($1000s), x Deviation: x – μ
41 41 – 41.5 = –0.5
• Determine the
38 38 – 41.5 = –3.5
deviation for each
39 39 – 41.5 = –2.5
data entry.
45 45 – 41.5 = 3.5
47 47 – 41.5 = 5.5
• μ = 41,5 41 41 – 41.5 = –0.5
44 44 – 41.5 = 2.5
41 41 – 41.5 = –0.5
37 37 – 41.5 = –4.5
42 42 – 41.5 = 0.5
Σx = 415 Σ(x – μ) = 0
. Copyright 2019, Pearson Education, 107
Solution: Finding Population
Standard Deviation
• Determine SSx Salary, x Deviation: x – μ Squares: (x – μ)2
41 41 – 41.5 = –0.5 (–0.5)2 = 0.25
38 38 – 41.5 = –3.5 (–3.5)2 = 12.25
39 39 – 41.5 = –2.5 (–2.5)2 = 6.25
45 45 – 41.5 = 3.5 (3.5)2 = 12.25
47 47 – 41.5 = 5.5 (5.5)2 = 30.25
41 41 – 41.5 = –0.5 (–0.5)2 = 0.25
44 44 – 41.5 = 2.5 (2.5)2 = 6.25
41 41 – 41.5 = –0.5 (–0.5)2 = 0.25
37 37 – 41.5 = –4.5 (–4.5)2 = 20.25
42 42 – 41.5 = 0.5 (0.5)2 = 0.25
Σ(x – μ) = 0 SSx = 88.5
. Copyright 2019, Pearson Education, 108
Solution: Finding Population
Standard Deviation
Population Variance

• 
2 (x   ) 2
88.5 Square
  thousands of
N 8.910 $ of Dollars

Population Standard Deviation

88.5 Bring it back


•  2
  to
to

3.0 10
thousands of
 $
The population variance is about 8.9, and the population standard
deviation is about 3.0, or $3,000.
. Copyright 2019, Pearson Education, 109
Sample Variance, and
Standard Deviation
Sample Variance
(x  x ) 2
s2
 n Correction, as samples tend
to underestimate variance
1
Sample Standard Deviation

2 (x  x ) 2
s s
 n 1

. Copyright 2019, Pearson Education, 110


Finding the Sample Variance
& Standard Deviation
In Words In Symbols
1. Find the mean of the x
x n
sample data set.
2. Find deviation of each xx
entry.
3. Square each (x  x ) 2
deviation.
SS x  (x 
4. Add to get the sum of x )2
squares.

. Copyright 2019, Pearson Education, 111


Finding the Sample Variance & Standard
Deviation
In Words In Symbols
5. Divide by n – 1 to get the (x  x ) 2
2
sample variance. s
 n 1
6. Find the square root to get
the sample standard (x  x ) 2
s
deviation.  n 1

. Copyright 2019, Pearson Education, 112


Example: Finding Sample
Variance & Standard Deviation
In a study of high school football players that suffered
concussions, researchers placed the players in two
groups. Players that recovered from their concussions in
14 days or less were placed in Group 1. Those that took
more than 14 days were placed in Group 2. The
recovery times (in days) for Group 1 are listed below.
Find the sample variance and standard deviation of the
recovery times.
4 7 6 7 9 5 8 10 9 8 7
10

. Copyright 2019, Pearson Education, 113


Solution: Finding Sample
Variance & Standard Deviation

• Determine the
deviation for each
data entry.

•x = 7,5

. Copyright 2019, Pearson Education, 114


Solution: Finding Sample Variance &
Standard Deviation

The sample variance is about 3.5, and the sample standard


deviation is about 1.9 days.
. Copyright 2019, Pearson Education, 115
Exercise 5:
Standard Deviation

The ages of a random sample of people surveyed in Deggendorf are:

14 23 24 16 16 22 18 19 21 25

Use the table provided to calculate sum of squares, variance and standard
deviation of the data set:

. Copyright 2019, Pearson Education, 116


Coefficient of Variation

. Copyright 2019, Pearson Education, 118


Example: Comparing Variation in
Different Data Sets
The table shows the population heights (in inches) and
weights (in pounds) of the members of a basketball team.
Find the coefficient of variation for the heights and the
weighs. Then compare the results.

. Copyright 2019, Pearson Education, 119


Solution: Comparing Variation in
Different Data Sets

. Copyright 2019, Pearson Education, 120


Solution: Comparing Variation in
Different Data Sets

The weights (9.4%) are more variable than


the heights (4.5%).

. Copyright 2019, Pearson Education, 121


Interpreting Standard
Deviation
• Standard deviation is a measure of the typical amount
an entry deviates from the mean.
• The more the entries are spread out, the greater the
standard deviation.

. Copyright 2019, Pearson Education, 122


Example: Estimating Standard
Deviation
Without calculating, estimate the population standard
deviation of each data set.

. Copyright 2019, Pearson Education, 123


Solution: Estimating Standard
Deviation

. Copyright 2019, Pearson Education, 124


Solution: Estimating Standard
Deviation

. Copyright 2019, Pearson Education, 125


Solution: Estimating Standard
Deviation

. Copyright 2019, Pearson Education, 126


Interpreting Standard Deviation:
Empirical Rule (68 – 95 – 99.7 Rule)
For data with a (symmetric) bell-shaped distribution, the
standard deviation has the following characteristics:

• About 68% of the data lie within one standard


deviation of the mean.
• About 95% of the data lie within two standard
deviations of the mean.
• About 99.7% of the data lie within three standard
deviations of the mean.

. Copyright 2019, Pearson Education, 127


Interpreting Standard Deviation:
Empirical Rule (68 – 95 – 99.7 Rule)
99.7% within 3 standard deviations

95% within 2 standard deviations


68% within 1
standard deviation

34% 34%

2.35% 2.35%
13.5% 13.5%

x  3s x  2s xs x x x x
s 2s 3s
. Copyright 2019, Pearson Education, 128
There are Different Forms of
Bell-Shaped Distributions

. Copyright 2019, Pearson Education, 129


Example: Using the Empirical
Rule
In a survey conducted by the National Center for Health
Statistics, the sample mean height of women in the
United States (ages 20-29) was 64.2 inches, with a
sample standard deviation of 2.9 inches. Estimate the
percent of the women whose heights are between 58.4
inches and 64.2 inches. Assume the sample is normally
distributed.

. Copyright 2019, Pearson Education, 130


22.10.202
2

Solution: Using the Empirical


Rule
• Because the distribution
is bell-shaped, you can
use the Empirical Rule.
• The mean height is 64.2
inches.

64.2 – (2.9 * 2) = 58.4


= two standard deviations
to the mean

. Copyright 2019, Pearson Education, 131


Solution: Using the Empirical
Rule

. Copyright 2019, Pearson Education, 132


Chebychev’s Theorem
• The portion of any data set lying within k standard deviations
(k > 1) of the mean is at least:

1
1
k2

. Copyright 2019, Pearson Education, 133


Example: Using Chebychev’s
Theorem
The age distributions for Georgia and Iowa are shown in the
histograms. Apply Chebychev’s Theorem to the data for Georgia
using k = 2. What can you conclude? Is an age of 100 unusual
for a Georgia resident? Explain.
(Source: Based on U.S. Census Bureau)

. Copyright 2019, Pearson Education, 134


22.10.201

Solution: Using Chebychev’s


9

Theorem

You can say that at least 75% of the population of Georgia is


between 0 and 81.9 years old. Also, an age of 100 lies more than
two standard deviations from the mean. So, this age is unusual.
Copyright 2019, Pearson Education, 135
Breakout Session 6

30. The mean wages for a sample of employees in a company was $16.50 per day
with a standard deviation of $1.50 per day. Estimate the percent of wages
between $12.00 and $21.00 per day. (Assume the data set is normally distributed
/ bell-shaped).

32. The mean duration of the 135 space shuttle flights was about 9.9 days, and
the standard deviation was about 3.8 days. Determine how many of the flights
lasted between 2.3 days and 17.5 days. (Assume, that the data set is not normally
distributed).

Copyright 2019, Pearson Education, 136


Chapt 2
er
Descriptive
Statistics

Copyright 2019, Pearson Education, 138


Chapter Outline
• 2.1 Frequency Distributions and Their Graphs

• 2.2 More Graphs and Displays

• 2.3 Measures of Central Tendency

• 2.4 Measures of Variation

• 2.5 Measures of Position

Copyright 2019, Pearson Education, 139


Section 2.5

Measures of Position

. Copyright 2019, Pearson Education, 140


Quartiles
• Fractiles are numbers that partition (divide) an
ordered data set into equal parts.
• Quartiles approximately divide an ordered data set
into four equal parts.
 First quartile, Q1: About one quarter of the data
fall on or below Q1.
 Second quartile, Q2: About one half of the data
fall on or below Q2 (median).
 Third quartile, Q3: About three quarters of the
data fall on or below Q3.

. Copyright 2019, Pearson Education, 141


Quartiles

Video

https://
www.youtube.com/watch?v=oPw2OpIZ4DY
. Copyright 2019, Pearson Education, 142
Example: Finding Quartiles
Each year in the U.S., automobile commuters waste fuel due to
traffic congestion. The amounts (in gallons per year) of fuel wasted
by commuters in the 15 largest U.S. urban areas are listed. Find
the first, second, and third quartiles of the data set. What do you
observe? (Source: Based on 2015 Urban Mobility Scorecard)

20 30 29 22 25 29 25 24 35 23 25 11 33 28
35

Solution:
• Q2 divides
Data entriesthe
to thedata
left of set
2 Q into two halves (Median).
Data entries to the right of Q
2

11 20 22 23 24 25 25 25 28 29 29 30 33 35
35
Q1 Q2 Q3
. Copyright 2019, Pearson Education, 143
Interquartile Range
Interquartile Range (IQR)
• A measure of variation that gives the range of the
middle portion (about half) of the data.
• The difference between the third and first
quartiles.
• IQR = Q3 – Q1

• Use IQR to identify outliers: Multiply IQR by


1.5
• Any data Q1 – 1.5 x IQR is an outlier
. Copyright 2019, Pearson Education, 144
Interquartile Range

Video

https://www.youtube.com/watch?v=VABsJBw1JqA

. Copyright 2019, Pearson Education, 145


Example: Finding the
Interquartile Range
Find the interquartile range of the data set from the
first example. Are their any outliers?

. Copyright 2019, Pearson Education, 146


Solution: Finding the
Interquartile Range

. Copyright 2019, Pearson Education, 147


Exercise IQR

. Copyright 2019, Pearson Education, 148


Box-and-Whisker Plot
Box-and-whisker plot
• Exploratory data analysis tool.
• Highlights important features of a data set.
• Requires (five-number summary):
1. Minimum entry
2. First quartile Q1
3. Median Q2
4. Third quartile Q3
5. Maximum entry
. Copyright 2019, Pearson Education, 150
Drawing a Box-and-Whisker
Plot
1. Find the five-number summary of the data set.
2. Construct a horizontal scale that spans the range of
the data.
3. Plot the five numbers above the horizontal scale.
4. Draw a box above the horizontal scale from Q1 to
Q3
and draw a vertical line in the box at Q2.
5. Draw whiskers from the box to the minimum and
Box
maximumWhisker
entries.
Whisker
Minimum Maximum
entry Q1 Median, Q2 Q3 entry

. Copyright 2019, Pearson Education, 151


Example: Drawing a Box-and-
Whisker Plot
Draw a box-and-whisker plot that represents the data set
in the first example.
Min = 11, Q1 = 23, Q2 = 25, Q3 = 30, Max = 35,

Solution:

The box represents about half of the data, which are


between 23 and 30.

. Copyright 2019, Pearson Education, 152


Example: Drawing a Box-and-
Whisker Plot
Solution:

The left whisker represents about one-quarter of the data,


so about 25% of the data entries are less than 23. The
right whisker represents about one-quarter of the data, so
about 25% of the data entries are greater than 30. Also,
the length of the left whisker is much longer than the
right one. This indicates that the data set has a possible
outlier to the left.
. Copyright 2019, Pearson Education, 153
Exercise:
Drawing a Box-and-Whisker Plot

. Copyright 2019, Pearson Education, 154


The Standard Score
Standard Score (z-score)
• Represents the number of standard deviations a
given value x falls from the mean μ.

• z value  mean x


standard deviation 

. Copyright 2019, Pearson Education, 155


Example: Finding z-Scores
The mean speed of vehicles along a stretch of highway is
56 miles per hour with a standard deviation of 4 miles
per hour.
You measure the speeds of three cars traveling along this
stretch of highway as:
1st car: 62 miles per hour
2nd car: 47 miles per
3rd hour 56 miles
Find the z-score
car: per hour
that corresponds to each speed. Assume
the distribution of the speeds is approximately bell-shaped.
. Copyright 2019, Pearson Education, 156
Solution: Finding z-Scores
Solution
The z-score that corresponds to each speed is calculated
below.
x = 62 mph x = 47 mph x = 56
mph
z = 62 − 56 = 1.5 z = 47 − 56 = − 2.25 z = 56 − 56 =
0
4 4
4
62 miles per hour is 1.5 standard deviations above the
mean; 47 miles per hour is 2.25 standard deviations
below the mean; and 56 miles per hour is equal to the
. mean. 47 miles per Copyright
hour is2019,
unusually slow, because its 157
Pearson Education,
Exercise: Find the Z-Score at 60 mph
The mean speed of vehicles along a stretch of highway is
56 miles per hour with a standard deviation of 4 miles
per hour.
You measure the speeds of three cars traveling along this
stretch of highway as:
4th car: 60 miles per hour
Find the z-score that corresponds to the speed of 60 mph.
Assume the distribution of the speeds is approximately
bell-shaped.

. Copyright 2019, Pearson Education, 158


Example: Comparing z-Scores
from Different Data Sets
The table shows the mean heights and standard
deviations for a population of men and a population of
women. Compare the z-scores for a 6-foot-tall (72 in. /
1.83m) man and a 6-foot-tall (72 in. / 1.83m) woman.
Assume the distributions of the heights are
approximately bell-shaped.

. Copyright 2019, Pearson Education, 160


Solution: Comparing z-Scores
from Different Data Sets
Solution
Note that 6 feet = 72 inches. Find the z-score for each
height.
• z-score for 6-foot-tall man

z=
x − µ = 72 − 69.9 = 0.7
� 3.0

• z-score for 6-foot-tall woman
x − µ = 72 − 64.3 ≈ 3.0
z= � 2.6

. Copyright 2019, Pearson Education, 161
Solution: Comparing z-Scores
from Different Data Sets
Solution
The z-score for the 6-foot-tall man is within 1 standard
deviation of the mean (69.9 inches). This is among
the typical heights for a man.
The z-score for the 6-foot-tall woman is about 3
standard deviations from the mean (64.3 inches). This is
an unusual height for a woman.

. Copyright 2019, Pearson Education, 162

You might also like