Chapter 04
Chapter 04
1. A data set with two values that are tied for the highest number of occurrences is called
bimodal.
True False
4. A trimmed mean may be preferable to a mean when a data set has extreme
values.
True False
5. One benefit of the box plot is that it clearly displays the standard
deviation.
True False
True False
9. When data are right-skewed, we expect the median to be greater than the
mean.
True False
True False
12. Chebyshev's Theorem says that at most 50 percent of the data lie within 2 standard deviations of the
mean.
True False
13 Chebyshev's Theorem says that at least 95 percent of the data lie within 2 standard deviations of the
. mean.
True False
14 If there are 19 data values, the median will have 10 values above it and 9 below it since n is
. odd.
True False
15 If there are 20 data values, the median will be halfway between two data values.
.
True False
16. In a left-skewed distribution, we expect that the median will be greater than the
mean.
True False
17 If the standard deviations of two samples are the same, so are their coefficients of
. variation.
True False
18 A certain health maintenance organization (HMO) examined the number of office visits by its members in the last year. This data
. set would probably be skewed to the left due to low outliers.
True False
19 A certain health maintenance organization (HMO) examined the number of office visits by its members in the last year. For this
. data set, the mean is probably not a very good measure of a "typical" person's office visits.
True False
20 Referring to this box plot of ice cream fat content, the median seems more "typical" of fat content than the midrange as a
. measure of center. (NOTE)
True False
21 Referring to this box plot of ice cream fat content, the mean would exceed the median.
.
True False
22 Referring to this box plot of ice cream fat content, the skewness would be negative.
.
True False
23 Referring to this graph of ice cream fat content, the second quartile is about 61.
.
True False
25 In calculating the sample variance, the sum of the squared deviations around the mean is divided by n - 1 to avoid
. underestimating the unknown population variance.
True False
26 Outliers are data values that fall beyond ±2 standard deviations from the
. mean.
True False
27 The Empirical Rule assumes that the distribution of data follows a normal curve.
.
True False
28. The Empirical Rule can be applied to any distribution, unlike Chebyshev's
theorem.
True False
29 When applying the Empirical Rule to a distribution of grades, if a student scored one standard deviation below the mean, then
. she would be at the 25 percentile of the distribution.
th
True False
True False
31 A platykurtic distribution is more sharply peaked (i.e., thinner tails) than a normal
. distribution.
True False
32. A leptokurtic distribution is more sharply peaked (i.e., thinner tails) than a normal
distribution.
True False
34 A sample consists of the following data: 7, 11, 12, 18, 20, 22, 43. Using the "three standard deviation" criterion, the last
. observation (X = 43) would be considered an outlier.
True False
B. a unit-free statistic.
B. It is less reliable than the mode when the data are continuous.
B. n/2 if n is even.
C. n/2 if n is odd.
45 In a sample of 10,000 observations from a normal population, how many would you expect to lie beyond three standard
. deviations of the mean?
A. None of them
B. About 27
C. About 100
D. About 127
46 The Excel formula for the standard deviation of a sample array named Data
. is:
A. =STDEV.S(Data).
B. =STANDEV(Data).
C. =STDEV.P(Data).
D. =SUM(Data)/(COUNT(Data)-1).
48 Estimating the mean from grouped data will tend to be most accurate
. when:
B. A distribution that is more peaked than a normal distribution (i.e., thinner tails) is
platykurtic.
A. In a left-skewed distribution, we expect that the median will exceed the mean.
54 Exam scores in a small class were 10, 10, 20, 20, 40, 60, 80, 80, 90, 100, 100. For this data set, which statement is incorrect
. concerning measures of center?
55 Exam scores in a small class were 0, 50, 50, 70, 70, 80, 90, 90, 100, 100. For this data set, which statement is incorrect
. concerning measures of center?
57 For U.S. adult males, the mean height is 178 cm with a standard deviation of 8 cm and the mean weight is 84 kg with a standard
. deviation of 8 kg. Elmer is 170 cm tall and weighs 70 kg. It is most nearly correct to say that:
58. John scored 85 on Prof. Hardtack's exam (Q = 40 and Q = 60). Based on the fences, which is
1 3
correct?
B. John is an outlier.
C. John is not an
outlier.
59 John scored 35 on Prof. Johnson's exam (Q = 70 and Q = 80). Based on the fences, which is correct?
1 3
B. John is an outlier.
60. A population consists of the following data: 7, 11, 12, 18, 20, 22, 25. The population variance
is:
A. 6.07.
B. 36.82.
C. 5.16.
D. 22.86.
61 Consider the following data: 6, 7, 17, 51, 3, 17, 23, and 69. The range and the median
. are:
A. 69 and 17.5.
B. 66 and 17.5.
C. 66 and 17.
D. 69 and 17.
D. average of Q and Q .
1 3
64 Which two statistics offer robust measures of center when outliers are present?
.
A. =STANDARDIZE
B. =NORM.DIST
C. =STDEV.P
D. =AVEDEV
66 Which Excel function would be least useful to calculate the quartiles for a column of
. data?
A. =STANDARDIZE
B. =PERCENTILE.EXC
C. =QUARTILE.EXC
D. =RANK
67 A sample of 50 breakfast customers of McDonald's showed the spending below. Which statement is least likely to be
. correct?
68 VenalCo Market Research surveyed 50 individuals who recently purchased a certain CD, revealing the age distribution shown
. below. Which statement is least defensible?
70 A sample of customers from Barnsboro National Bank shows an average account balance of $315 with a standard deviation of
. $87. A sample of customers from Wellington Savings and Loan shows an average account balance of $8350 with a standard
deviation of $1800. Which statement about account balances is correct?
A. box plot
B. bar chart
C. histogram
D. scatter plot
73 If the mean and median of a population are the same, then its distribution is:
.
A. normal.
B. skewed.
C. symmetric.
D. uniform
.
74. In the following data set {7, 5, 0, 2, 7, 15, 5, 2, 7, 18, 7, 3, 0}, the value 7
is:
A. the mean.
B. the mode.
A. 800.
B. 1000.
C. 900.
D. 950.
76. The 25 percentile for waiting time in a doctor's office is 19 minutes. The 75 percentile is 31 minutes. The interquartile range is:
th th
A. 12 minutes.
B. 16 minutes.
C. 22 minutes.
77 The 25 percentile for waiting time in a doctor's office is 19 minutes. The 75 percentile is 31 minutes. Which is incorrect
th th
78 When using Chebyshev's Theorem, the minimum percentage of sample observations that will fall within two standard deviations
. of the mean will be __________ the percentage within two standard deviations if a normal distribution is assumed (Empirical
Rule).
A. smaller than
B. greater
than
C. the same as
80 Based on daily measurements, Bob's weight has a mean of 200 pounds with a standard deviation of 16 pounds, while Mary's
. weight has a mean of 125 pounds with a standard deviation of 15 pounds. Who has the smaller relative variation?
A. Bob
B. Mary
81 Frieda is 67 inches tall and weighs 135 pounds. Women her age have a mean height of 65 inches with a standard deviation of
. 2.5 inches and a mean weight of 125 pounds with a standard deviation of 10 pounds. In relative terms, it is correct to say that:
B. The standard deviation is in the same units as the mean (e.g., kilograms).
C. The mean from a frequency tabulation may differ from the mean from raw
data.
A. box plot.
B. dot plot.
C. histogram.
D. scatter plot.
A. The median personal income of California taxpayers would probably be near the mean.
B. The interquartile range offers a measure of income inequality among California residents.
C. For income, the sum of squared deviations about the mean is negative about half the
time.
D. For personal incomes in California, outliers in either tail would be equally likely.
D. about 32 percent of the data are beyond one standard deviation from the
mean.
87 Three randomly chosen Seattle students were asked how many round trips they made to Canada last year. Their replies were 3,
. 4, 5. The geometric mean is:
A. 3.877.
B. 4.000.
C. 3.915.
D. 4.422.
88 Three randomly chosen California students were asked how many times they drove to Mexico last year. Their replies were 4, 5,
. 6. The geometric mean is:
A. 3.87.
B. 5.00.
C. 5.42.
D. 4.93.
89 Three randomly chosen Colorado students were asked how many times they went rock climbing last month. Their replies were
. 5, 6, 7. The standard deviation is:
A. 1.212.
B. 0.816.
C. 1.000.
D. 1.056.
90 Patient survival times after a certain type of surgery have a very right-skewed distribution due to a few high outliers.
. Consequently, which statement is most likely to be correct?
A. Median > Midrange
91 So far this year, stock A has had a mean price of $6.58 per share with a standard deviation of $1.88, while stock B has had a
. mean price of $10.57 per share with a standard deviation of $3.02. Which stock is more volatile?
A. Stock A
B. Stock B
A. box plot.
B. dot plot.
C. histogram.
D. Pareto chart.
B. Range
C. Coefficient of variation
D. Trimmed mean
94 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 3, 2, 1, 2, 1, 5, 9, 1, 2, 3, 3, 10. The geometric mean is:
A.
B. 2.604
C. 1.517
D.
95 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 3, 2, 1, 2, 1, 5, 9, 1, 2, 3, 3, 10. The median is:
A. 7.0.
B. 3.0.
C. 3.5.
D. 2.5.
98 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 2, 1, 5, 1, 1, 3, 4, 3, 1, 1, 5, 18. For this sample, the geometric mean is:
A. 2.158.
B. 1.545.
C. 2.376.
D. 3.017.
99 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 2, 1, 5, 1, 1, 3, 4, 3, 1, 1, 5, 18. For this sample, the median is:
A. 2.
B. 3.
C. 3.5.
D. 2.5.
100 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 2, 1, 5, 1, 1, 3, 4, 3, 1, 1, 5, 18. For this sample, which measure of center is least representative of the "typical"
student?
A. Mean
B. Median
C. Mode
D. Midrange
101. Here are statistics on order sizes of Megalith Construction Supply's shipments of two kinds of construction materials last
year.
A. Girders
B. Rivets
102 The quartiles of a distribution are most clearly revealed in which display?
.
A. Box plot
B. Scatter plot
C. Histogram
D. Dot plot
C. always zero.
104. What does the graph below (profit/sales ratios for 25 Fortune 500 companies)
reveal?
105 Find the sample correlation coefficient for the following data.
.
A. .8911
B. .9132
C. .9822
D. .9556
106 Heights of male students in a certain statistics class range from X = 61 to X = 79. Applying the Empirical Rule, a reasonable
min max
A. 2.75.
B. 3.00.
C. 3.25.
D. 3.50.
107 A reporter for the campus paper asked five randomly chosen students how many occupants, including the driver, ride to school
. in their cars. The responses were 1, 1, 1, 1, 6. The coefficient of variation is:
A. 25 percent.
B. 250 percent.
C. 112 percent.
D. 100 percent.
108 A smooth distribution with one mode is negatively skewed (skewed to the left). The median of the distribution is $65. Which of
. the following is a reasonable value for the distribution mean?
A. $76
B. $54
C. $81
D. $65
109. In a positively skewed distribution, the percentage of observations that fall below the median
is:
A. about 50 percent.
A. continuous data.
B. categorical data.
C. discrete data.
112 Craig operates a part-time snow-plowing business using a 2002 GMC 2500 HD extended cab short box truck. This box plot of
. Craig's MPG on 195 tanks of gas does not support which statement?
113 Estimate the mean exam score for the 50 students in Prof. Axolotl's class.
.
A. 59.2
B. 62.0
C. 63.5
D. 64.1
114 A survey of salary increases received during a recent year by 44 working MBA students is shown. Find the approximate mean
. percent raise.
A. 6.56
B. 6.74
C. 5.90
D. 6.39
115 The following frequency distribution shows the amount earned yesterday by employees of a large Las Vegas casino. Estimate
. the mean daily earnings.
A. $112.50
B. $125.01
C. $105.47
D. $117.13
116 The following table is the frequency distribution of parking fees for a day:
.
A. $7.07.
B. $6.95.
C. $7.00.
D. $7.25.
A. 4.550
B. 3.798
C. 4.278
D. 2.997
118 The 25 percentile for waiting time in a doctor's office is 10 minutes. The 75 percentile is 30 minutes. Which is incorrect
th th
119 Five homes were recently sold in Oxnard Acres. Four of the homes sold for $400,000, while the fifth home sold for $2.5 million.
. Which measure of central tendency best represents a typical home price in Oxnard Acres?
A. The mean or
median.
B. The median or
mode.
C. The mean or mode.
120 In Tokyo, construction workers earn an average of ×420,000 (yen) per month with a standard deviation of ×20,000, while in
. Hamburg, Germany, construction workers earn an average of €3,200 (euros) per month with a standard deviation of €57. Who
is earning relatively more, a worker making ×460,000 per month in Tokyo or one earning €3,300 per month in Hamburg?
B. If the data are from a normal population, about 68 percent of the values will be within μ ±
σ.
B. Standard deviation
C. Midhinge
D. Interquartile range
123 If Q = 150 and Q = 250, the upper fences (inner and outer) are:
1 3
B. Figure B.
125. Which of the following statements is likely to apply to the incomes of 50 randomly chosen taxpayers in
California?
126 A certain health maintenance organization (HMO) examined the number of office visits by each of its members in the last year.
. For this data set, we would anticipate that the geometric mean would be
A. a reasonable measure of
center.
127 Three randomly chosen Colorado students were asked how many times they went rock climbing last month. Their replies were
. 5, 6, 7. The coefficient of variation is:
A. 16.7 percent.
B. 13.6 percent.
C. 20.0
percent.
D. 35.7
percent.
128. The mean of a population is 50 and the median is 40. Which histogram is most likely for samples from this
population?
A. Sample A.
B. Sample B.
C. Sample C.
131 In Osaka, Japan, stock brokers earn ×6000 per hour on the average, with a standard deviation of ×1200. In Stuttgart,
. Germany, stock brokers earn an average of €18 per hour with a standard deviation of €6. In which country is the variation in
wages greatest?
132 Find the coefficient of variation of these numbers: 14, 17, 17, 19, 26. Would the variability of those numbers be greater than,
. less than, or the same as the variability of 24, 27, 27, 29, 36? Defend your answer.
133 Ten randomly chosen students at a certain university were asked how many times they smoked marijuana during the
. preceding week. Their answers were 0, 8, 0, 0, 2, 4, 0, 0, 6, 0. A campus newspaper article appeared, with the headline
"Average Student Uses No Pot." Is this a fair assessment of central tendency? Discuss the alternatives.
134 Twelve students were asked how many credit cards they owned. The responses were 0, 0, 1, 2, 2, 3, 3, 4, 4, 5, 5, 11. (a) Find
. the mean, median, and mode. (b) Which measure of center seems best in this case? (c) Find the first and third quartiles. What
do they tell you?
135 Eleven students were asked how many siblings they had. The responses were 0, 1, 2, 2, 2, 2, 2, 3, 3, 4, 5. Find the mean,
. median, mode, and geometric mean. Which would you prefer in this case, and why not the others?
136 Patient waiting times in the Tardis Orthopedic Clinic have a mean of 50 minutes with a standard deviation of 25 minutes. Within
. what range would approximately 95 percent of the waiting times lie if we were sampling a normal distribution? Do you think the
distribution is likely to be normal? Explain.
137 The athletic departments at 10 randomly selected U.S. universities were asked by the Equal Employment Opportunity
. Commission to state what percentage of their nursing scholarships were presently held by women. The responses were 5, 4,
2, 1, 1, 2, 10, 5, 5, 5. Find the mean, median, mode, and geometric mean. Which is the most appropriate measure of central
tendency? The least appropriate? Explain your answer. Is there an outlier?
138 A survey of 10 randomly chosen drivers showed the following number of persons per car, including the driver: 1, 5, 1, 5, 2, 1, 1,
. 1, 2, 1. Describe the center, variability, and skewness for this sample.
139 A national survey showed that most commuter cars contain only the driver. Hungry for a story, a campus newspaper reporter
. asked five randomly chosen commuter students how many occupants, including the driver, rode to school in their cars. Their
responses were 1, 1, 1, 1, and 6. The next day a story appeared in the paper headlined "University Commuters Double
National Average Ridership." Is this a reasonable assessment of central tendency? How would you characterize the variability
of the sample?
140 A 10-point quiz was given by Professor Ennuyeaux. Of the 10 students in the class, half got zero and the others got perfect
. scores. List the students' scores. Then find the mean, median, mode, and geometric mean of their scores. Which is the most
appropriate measure of center? The least appropriate?
141 The owner of a chicken farm kept track of each hen's eating and egg production for many months, with the results below.
. Which has more variation, feed consumption or egg output?
142. Below are the ages of 21 CEOs. Find the mean, median, and mode. Are there any outliers?
Explain.
46, 48, 49, 49, 50, 52, 54, 55, 57, 57, 58, 59, 60, 61, 62, 62, 63, 63, 65, 67, 75
143 Bob's sample of freshman GPAs showed a mean of 2.72 with a standard deviation of 0.31. (a) What range would you predict
. for all the grades? For the middle 95 percent? Explain. (b) Why might your estimates be inaccurate?
144 A team of introductory statistics students went to a grocery store and recorded the total calories and fat calories for various
. kinds of soup. They produced a table of statistics and two dot plots. Write a succinct summary of the center, variability, and
shape for each data set. Note: TrimMean is the 5 percent trimmed mean (removing the smallest 5 percent and the largest 5
percent of the values, rounded to the nearest integer).
145 Here are descriptive statistics from Excel for annual per-pupil expenditures in 94 Ohio cities and home sizes in a certain
. neighborhood. Very briefly compare the variability and shape of the two data sets.
146 Below are shown a dot plot and summary statistics for a random sample of 34 shower heads. The measurements are
. maximum flow rates (in gallons per minute) at pressure of 80 pounds per square inch. Use the data to illustrate the difference
between the two alternative definitions of "outlier," and make any other comments you feel are relevant. Note: TrimMean
removes the smallest 5 percent and the largest 5 percent of the values.
147 Briefly describe these data. Sketch its box plot and describe the sample
. succinctly.
148 Craig operates a part-time snow-plowing business using a 2002 GMC 2500 HD extended cab short box truck. Describe Craig's
. gasoline mileage based on this histogram of 195 tanks of gas.
149 Craig operates a part-time snow-plowing business using a 2002 GMC 2500 HD extended cab short box truck. Describe Craig's
. gasoline mileage based on this box plot of 195 tanks of gas.
150 Here are advertised prices of 21 used Chevy Blazers. Describe the distribution (center, variability, shape).
.
151 Briefly describe this sample of departure delays on American Airlines flights out of Denver over a seven-day period, March 3-9
. (n = 149 flights).
152 Six graduates from Fulsome University's Master's of Waste Management program were hired by a Saudi Arabian firm at
. $110,000 each, while the other four graduates were unemployed. The university placement office bragged, "Our MWM
graduates enjoyed a median starting salary of $110,000." Is this a reasonable assessment of central tendency? What are the
alternatives?
Answer Key
True / False Questions
1. A data set with two values that are tied for the highest number of occurrences is called
bimodal.
TRUE
Extremes distort the midrange (average of highest and lowest data values).
MIDRANGE = (1+1000)/2
TRUE
The second quartile, the median, and the 50 percentile are the same thing.
th
4. A trimmed mean may be preferable to a mean when a data set has extreme
values.
TRUE
5. One benefit of the box plot is that it clearly displays the standard
deviation.
FALSE
TRUE
Median = 5
9. When data are right-skewed, we expect the median to be greater than the
mean.
FALSE
It's the other way around, as the mean will be pulled up by extremes.
10. The sum of the deviations around the mean is always zero.
TRUE
Outliers have little effect on the midhinge (average of the 25 and 75 percentiles).
th th
12. Chebyshev's Theorem says that at most 50 percent of the data lie within 2 standard deviations of the
mean.
FALSE
13 Chebyshev's Theorem says that at least 95 percent of the data lie within 2 standard deviations of the
. mean.
FALSE
14 If there are 19 data values, the median will have 10 values above it and 9 below it since n is odd.
.
FALSE
When n is odd, the median is the middle member of the sorted data set. In this case, the median is x and there will be 9 below
10
15 If there are 20 data values, the median will be halfway between two data values.
.
TRUE
16. In a left-skewed distribution, we expect that the median will be greater than the
mean.
TRUE
17 If the standard deviations of two samples are the same, so are their coefficients of
. variation.
FALSE
18 A certain health maintenance organization (HMO) examined the number of office visits by its members in the last year. This data
. set would probably be skewed to the left due to low outliers.
FALSE
Lower bound is zero, but high extremes are likely for sicker individuals.
AACSB: Analytical Thinking
Accessibility: Keyboard Navigation
Blooms: Evaluate
Difficulty: 3 Hard
Learning Objective: 04-01 Explain the concepts of center, variability, and
shape.
Topic: Measures of Center
19 A certain health maintenance organization (HMO) examined the number of office visits by its members in the last year. For this
. data set, the mean is probably not a very good measure of a "typical" person's office visits.
TRUE
Lower bound is zero, but high extremes are likely for sicker individuals.
20 Referring to this box plot of ice cream fat content, the median seems more "typical" of fat content than the midrange as a
. measure of center.
TRUE
Midrange (average of low and high) will be pulled down by left-tail minimum.
EXPLANATION:
If the data is skewed or has outliers, the median is a more robust measure of the centre than
the midrange.
The midrange is the mathematical average of the minimum and maximum values in the
dataset and can be influenced heavily by outliers.
Whereas, the median is the value that separates the lower 50% of the dataset from the upper
50%, and it is not affected by extreme values or outliers in the same way as the midrange.
Here, In the given boxplot, the data is heavily left skewed, and the distribution is totally non-
symmetric. Hence, we can say that the median is a better measure of the "typical" value than
the midrange.
Hence, True
Explanation:
Therefore the statement that, the median seems more "typical" of fat content than the
midrange as a measure of center is True.
AACSB: Analytical Thinking
Blooms: Apply
Difficulty: 2 Medium
Learning Objective: 04-08 Make and interpret box
plots.
Topic: Percentiles, Quartiles, and Box Plots
21 Referring to this box plot of ice cream fat content, the mean would exceed the median.
.
FALSE
22 Referring to this box plot of ice cream fat content, the skewness would be negative.
.
TRUE
Data are skewed left (negative skewness) as indicated by long left tail.
23 Referring to this graph of ice cream fat content, the second quartile is about 61.
.
TRUE
Range depends only on highest and lowest data values, so it is easily distorted.
25 In calculating the sample variance, the sum of the squared deviations around the mean is divided by n - 1 to avoid
. underestimating the unknown population variance.
TRUE
Check the definition. You lose one piece of information because the mean is estimated.
26 Outliers are data values that fall beyond ±2 standard deviations from the
. mean.
FALSE
27 The Empirical Rule assumes that the distribution of data follows a normal curve.
.
TRUE
28. The Empirical Rule can be applied to any distribution, unlike Chebyshev's theorem.
FALSE
29 When applying the Empirical Rule to a distribution of grades, if a student scored one standard deviation below the mean, then
. she would be at the 25 percentile of the distribution.
th
FALSE
About 15.87 percent (not 25 percent) are less than one standard deviation below the mean (in a normal distribution).
(100%-68%)/2 = 16%
She would be at the 25 percentile of the distribution. This means that she will be top 75%.
th
NOTE:
31 A platykurtic distribution is more sharply peaked (i.e., thinner tails) than a normal
. distribution.
FALSE
32. A leptokurtic distribution is more sharply peaked (i.e., thinner tails) than a normal
distribution.
TRUE
The sign of Excel's kurtosis coefficient indicates the kurtosis direction relative to a normal distribution.
AACSB: Analytical Thinking
Accessibility: Keyboard Navigation
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 04-11 Assess skewness and kurtosis in a
sample.
Topic: Skewness and Kurtosis
34 A sample consists of the following data: 7, 11, 12, 18, 20, 22, 43. Using the "three standard deviation" criterion, the last
. observation (X = 43) would be considered an outlier.
FALSE
43 is not more than three standard deviations above the mean for this data set.
µ = 19
σ = 11.9
(µ ± 3σ)
Interval = (-16,7 ; 54,7)
43 is within that interval; therefore, 43 is within 3 standard deviation. So, 43 is not outlier
NOTE: Tuy nhiên nếu đề hỏi số 56 có phải outlier hay không, thì câu trả lời là có vì 56 > 54,7.
B. a unit-free statistic.
When the quartiles lie between two data values, the method of medians goes halfway between the values (very simple), while
Excel interpolates between them in a more complex way.
B. It is less reliable than the mode when the data are continuous.
The mean utilizes all n data values. Deviations always sum to zero around the mean. The mean works for continuous data
(unlike the mode). The mean often differs from the median in business data.
B. n/2 if n is even.
C. n/2 if n is odd.
This formula always works for the median position. For example, if n = 10 (even) the median is at position (10 + 1)/2 = 5.5, or
halfway between x and x . But if n = 11 (odd) the median is at position (11 + 1)/2 = 6, which is observation x .
5 6 6
Although both the mean and the geometric mean are affected by high extremes in skewed data, the geometric mean tends to
reduce their influence.
The standard deviation applies to any data measured on a ratio or interval scale. Because it is a square root, its visual
interpretation may be less clear than the MAD.
42 Chebyshev's Theorem:
.
The strength of Chebyshev's Theorem is that it makes no assumption about normality, while the E.R. only works for normal
populations.
Data values outside the quartiles (top or bottom 25 percent) are not very
unusual.
45 In a sample of 10,000 observations from a normal population, how many would you expect to lie beyond three standard
. deviations of the mean?
A. None of them
B. About 27
C. About 100
D. About 127
46 The Excel formula for the standard deviation of a sample array named Data
. is:
A. =STDEV.S(Data).
B. =STANDEV(Data).
C. =STDEV.P(Data).
D. =SUM(Data)/(COUNT(Data)-1).
AACSB: Technology
Accessibility: Keyboard Navigation
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 04-03 Calculate and interpret common measures of
variability.
Topic: Measures of Variability
48. Estimating the mean from grouped data will tend to be most accurate when:
Many bins and uniform data distribution within bins would give a result closest to the ungrouped mean
μ.
B. A distribution that is more peaked than a normal distribution (i.e., thinner tails) is
platykurtic.
Shape is hard to judge in small samples. The 50 is just a rule of thumb. Excel computes kurtosis for samples of any size, but
tables of critical values may not go down below 50.
Skewness due to extreme data values is common in business data. Right skewness is common, which increases the mean
relative to the median.
The E.R. applies only to normal populations, while Chebyshev's Theorem is general.
AACSB: Analytical Thinking
Accessibility: Keyboard Navigation
Blooms: Understand
Difficulty: 2 Medium
Learning Objective: 04-05 Apply the Empirical Rule and recognize
outliers.
Topic: Standardized Data
A. In a left-skewed distribution, we expect that the median will exceed the mean.
The mean is pulled down in left-skewed data, but deviations around it sum to zero in any data set. The median may be between
two data values and may not be in the middle of the box plot.
To find the geometric mean, multiply the data values and take the 11 root to get G = 41.02. Outliers affect both the mean and
th
55 Exam scores in a small class were 0, 50, 50, 70, 70, 80, 90, 90, 100, 100. For this data set, which statement is incorrect
. concerning measures of center?
The median is 75 (halfway between x = 70 and x = 80 in the sorted array). The zeros render the geometric mean useless. The
5 6
56. Exam scores in a random sample of students were 0, 50, 50, 70, 70, 80, 90, 90, 90, 100. Which statement is
incorrect?
A. The standard deviation is 29.61.
57 For U.S. adult males, the mean height is 178 cm with a standard deviation of 8 cm and the mean weight is 84 kg with a standard
. deviation of 8 kg. Elmer is 170 cm tall and weighs 70 kg. It is most nearly correct to say that:
Convert Elmer's height and weight to z-scores. For Elmer's weight, z = (x - μ)/σ = (70 - 84)/8 = -1.75, while for Elmer's height, z
= (x - μ)/σ = (170 - 178)/8 = -1.00. Therefore, Elmer is farther from the mean weight than from the mean height.
58. John scored 85 on Prof. Hardtack's exam (Q = 40 and Q = 60). Based on the fences, which is
1 3
correct?
B. John is an outlier.
C. John is not an
outlier.
D. John is in the 85 percentile.
th
59 John scored 35 on Prof. Johnson's exam (Q = 70 and Q = 80). Based on the fences, which is correct?
1 3
B. John is an outlier.
The lower inner fence is 70 - 1.5(80 - 70) = 55 so John is an outlier. Actually, John is an extreme outlier because the lower outer
fence is 70 - 3.0(80 - 70) = 40.
60. A population consists of the following data: 7, 11, 12, 18, 20, 22, 25. The population variance
is:
A. 6.07.
B. 36.82.
C. 5.16.
D. 22.86.
61 Consider the following data: 6, 7, 17, 51, 3, 17, 23, and 69. The range and the median
. are:
A. 69 and 17.5.
B. 66 and 17.5.
C. 66 and 17.
D. 69 and 17.
D. average of Q and Q .
1 3
Median position is always (n + 1)/2. It need not be halfway between the quartiles.
The range is easy to calculate but utilizes only two data values, which may be unusual.
64 Which two statistics offer robust measures of center when outliers are present?
.
Extremes are excluded from the trimmed mean and do not affect the median.
A. =STANDARDIZE
B. =NORM.DIST
C. =STDEV.P
D. =AVEDEV
You need the sample mean and sample standard deviation to find the z-score.
AACSB: Technology
Accessibility: Keyboard Navigation
Blooms: Remember
Difficulty: 1 Easy
Learning Objective: 04-06 Transform a data set into standardized
values.
Topic: Standardized Data
66 Which Excel function would be least useful to calculate the quartiles for a column of
. data?
A. =STANDARDIZE
B. =PERCENTILE.EXC
C. =QUARTILE.EXC
D. =RANK
AACSB: Technology
Accessibility: Keyboard Navigation
Blooms: Remember
Difficulty: 2 Medium
Learning Objective: 04-07 Calculate quartiles and other
percentiles.
Topic: Percentiles, Quartiles, and Box Plots
67 A sample of 50 breakfast customers of McDonald's showed the spending below. Which statement is least likely to be
. correct?
68 VenalCo Market Research surveyed 50 individuals who recently purchased a certain CD, revealing the age distribution shown
. below. Which statement is least defensible?
EXPLANATION:
Multiply and take the 3 root to get the geometric mean of 4.932. With only three data values, the quartiles cannot be calculated
rd
Calculate the coefficient of variation for each bank. For Barnsboro, CV = 100 × s/= 100 × 87/315 = 27.62, while for Wellington
CV = 100 × s/= 100 × 1800/8350 = 21.56.
A. box plot
B. bar chart
C. histogram
D. scatter plot
73 If the mean and median of a population are the same, then its distribution is:
.
A. normal.
B. skewed.
C. symmetric.
D. uniform
.
74. In the following data set {7, 5, 0, 2, 7, 15, 5, 2, 7, 18, 7, 3, 0}, the value 7
is:
A. the mean.
B. the mode.
A. 800.
B. 1000.
C. 900.
D. 950.
76. The 25 percentile for waiting time in a doctor's office is 19 minutes. The 75 percentile is 31 minutes. The interquartile range is:
th th
A. 12 minutes.
B. 16 minutes.
C. 22 minutes.
77 The 25 percentile for waiting time in a doctor's office is 19 minutes. The 75 percentile is 31 minutes. Which is incorrect
th th
Apply definitions of fences. For example, the upper inner fence is 31 + 1.5(31 - 19) = 49.
78 When using Chebyshev's Theorem, the minimum percentage of sample observations that will fall within two standard deviations
. of the mean will be __________ the percentage within two standard deviations if a normal distribution is assumed (Empirical
Rule).
A. smaller than
B. greater
than
C. the same as
Chebyshev guarantees fewer observations within two standard deviations than the E.R.
A few high values would skew the data badly in all but the hamburger example, because a McDonald's hamburger is a standard
menu item.
80 Based on daily measurements, Bob's weight has a mean of 200 pounds with a standard deviation of 16 pounds, while Mary's
. weight has a mean of 125 pounds with a standard deviation of 15 pounds. Who has the smaller relative variation?
A. Bob
B. Mary
Calculate the coefficients of variation for Bob and Mary. Bob's CV = 100 × s/= 100 × 16/200 = 8.00, while Mary's CV = 100 × s/=
100 × 15/125 = 12.00. Therefore, Bob's weight varies less than Mary's weight in relative terms.
81 Frieda is 67 inches tall and weighs 135 pounds. Women her age have a mean height of 65 inches with a standard deviation of
. 2.5 inches and a mean weight of 125 pounds with a standard deviation of 10 pounds. In relative terms, it is correct to say that:
Calculate the z-scores for Frieda's weight and Frieda's height. For Frieda's height, z = (x - μ)/σ = (67 - 65)/(2.5) = 0.80, while for
Frieda's weight, z = (x - μ)/σ = (135 - 125)/10 = 1.00. Therefore, Frieda's weight is farther from the mean than her height. For
heights, the CV = 100 × σ/μ = 100 × (2.5)/(65) = 3.8%, while for weights, CV = 100 × σ/μ = 100 × 10/125 = 8.0% (both CVs are
below 10%).
B. The standard deviation is in the same units as the mean (e.g., kilograms).
C. The mean from a frequency tabulation may differ from the mean from raw
data.
Normal populations are symmetric, but a sample may differ from the population.
A. box plot.
B. dot plot.
C. histogram.
D. scatter plot.
The bin limits in a histogram may be rounded, so the values of x and x may be unclear.
min max
C. For income, the sum of squared deviations about the mean is negative about half the
time.
D. For personal incomes in California, outliers in either tail would be equally likely.
Incomes are likely to be skewed due to high extremes, while income is bounded on the low end by zero. A wider IQR would
suggest greater inequality of incomes.
D. about 32 percent of the data are beyond one standard deviation from the
mean.
The E.R. says that about 68 percent of the observations are within one standard deviation of the mean. Business data often are
skewed.
87 Three randomly chosen Seattle students were asked how many round trips they made to Canada last year. Their replies were 3,
. 4, 5. The geometric mean is:
A. 3.877.
B. 4.000.
C. 3.915.
D. 4.422.
Multiply the three numbers and take the 3 root of 60 to get 3.915.
rd
88 Three randomly chosen California students were asked how many times they drove to Mexico last year. Their replies were 4, 5,
. 6. The geometric mean is:
A. 3.87.
B. 5.00.
C. 5.42.
D. 4.93.
Multiply the three numbers and take the 3 root of 120 to get 4.932.
rd
89 Three randomly chosen Colorado students were asked how many times they went rock climbing last month. Their replies were
. 5, 6, 7. The standard deviation is:
A. 1.212.
B. 0.816.
C. 1.000.
D. 1.056.
90 Patient survival times after a certain type of surgery have a very right-skewed distribution due to a few high outliers.
. Consequently, which statement is most likely to be correct?
91 So far this year, stock A has had a mean price of $6.58 per share with a standard deviation of $1.88, while stock B has had a
. mean price of $10.57 per share with a standard deviation of $3.02. Which stock is more volatile?
A. Stock A
B. Stock B
A. box plot.
B. dot plot.
C. histogram.
D. Pareto chart.
On a boxplot, outliers are identified by their distance from the median. Data values outside the inner fences (median ± 1.5 IQR)
are outliers. Data values beyond the outer fences (median ± 3.0 IQR) are extreme outliers. This definition of "outlier" is not the
same as the Empirical Rule, which is based on the distance from the mean.
B. Range
C. Coefficient of variation
D. Trimmed mean
94 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 3, 2, 1, 2, 1, 5, 9, 1, 2, 3, 3, 10. The geometric mean is:
A.
B. 2.604
C. 1.517
D.
95 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 3, 2, 1, 2, 1, 5, 9, 1, 2, 3, 3, 10. The median is:
A. 7.0.
B. 3.0.
C. 3.5.
D. 2.5.
Although we square the deviations around the mean, we take the square root of the sum to get back to the original units of X.
However, the standard deviation is affected by outliers and its interpretation may be nonintuitive.
A. 2.158.
B. 1.545.
C. 2.376.
D. 3.017.
99 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 2, 1, 5, 1, 1, 3, 4, 3, 1, 1, 5, 18. For this sample, the median is:
A. 2.
B. 3.
C. 3.5.
D. 2.5.
Sort and look halfway between the two middle data values.
100 Twelve randomly chosen students were asked how many times they had missed class during a certain semester, with this
. result: 2, 1, 5, 1, 1, 3, 4, 3, 1, 1, 5, 18. For this sample, which measure of center is least representative of the "typical"
student?
A. Mean
B. Median
C. Mode
D. Midrange
The unusual data value pulls up the mean (3.75) but affects the midrange (1 + 18)/2 = 9.5 even more noticeably.
101 Here are statistics on order sizes of Megalith Construction Supply's shipments of two kinds of construction materials last year.
.
A. Girders
B. Rivets
Calculate the coefficient of variation. For girders, the CV = 100 × s/= 100 × (48)/(160) = 30.00%, while for rivets, CV = 100 ×
s/= 100 × 702/2800 = 25.07.
102 The quartiles of a distribution are most clearly revealed in which display?
.
A. Box plot
B. Scatter plot
C. Histogram
D. Dot plot
The histogram, scatter plot, or dot plot will not directly show quartiles.
C. always zero.
104. What does the graph below (profit/sales ratios for 25 Fortune 500 companies) reveal?
Box is skewed right, so mean probably exceeds the median. The IQR is about 12 - 4 =
8.
105 Find the sample correlation coefficient for the following data.
.
A. .8911
B. .9132
C. .9822
D. .9556
106 Heights of male students in a certain statistics class range from X = 61 to X = 79. Applying the Empirical Rule, a reasonable
min max
A. 2.75.
B. 3.00.
C. 3.25.
D. 3.50.
107 A reporter for the campus paper asked five randomly chosen students how many occupants, including the driver, ride to school
. in their cars. The responses were 1, 1, 1, 1, 6. The coefficient of variation is:
A. 25 percent.
B. 250 percent.
C. 112 percent.
D. 100 percent.
108 A smooth distribution with one mode is negatively skewed (skewed to the left). The median of the distribution is $65. Which of
. the following is a reasonable value for the distribution mean?
A. $76
B. $54
C. $81
D. $65
109. In a positively skewed distribution, the percentage of observations that fall below the median
is:
A. about 50 percent.
Mode is helpful for categorical data and is easy to calculate in small samples, but requires sorting the sample. Continuous
(decimal) data generally have no mode, or, if a mode exists, it is often not near the center.
A. continuous data.
B. categorical data.
C. discrete data.
Mode is good for discrete or categorical data but fails for continuous
data.
11 Craig operates a part-time snow-plowing business using a 2002 GMC 2500 HD extended cab short box truck. This box plot of
2 Craig's MPG on 195 tanks of gas does not support which statement?
A. There are several outliers.
Narrow box. With outliers in both tails, it's unclear which way skewness would be.
N = 195
Group % data
X min to Q1 25%
Q1 to Q2 25%
Q2 to Q3 25%
Q3 to X max 25%
113 Estimate the mean exam score for the 50 students in Prof. Axolotl's class.
.
A. 59.2
B. 62.0
C. 63.5
D. 64.1
Apply the formulas for weighted average using interval midpoint multiplied by frequency.
114 A survey of salary increases received during a recent year by 44 working MBA students is shown. Find the approximate mean
. percent raise.
A. 6.56
B. 6.74
C. 5.90
D. 6.39
Apply the formulas for weighted average using interval midpoint multiplied by frequency.
115 The following frequency distribution shows the amount earned yesterday by employees of a large Las Vegas casino. Estimate
. the mean daily earnings.
A. $112.50
B. $125.01
C. $105.47
D. $117.13
Apply the formulas for weighted average using interval midpoint multiplied by frequency.
116 The following table is the frequency distribution of parking fees for a day:
.
A. $7.07.
B. $6.95.
C. $7.00.
D. $7.25.
Apply the formulas for weighted average using interval midpoint multiplied by frequency.
A. 4.550
B. 3.798
C. 4.278
D. 2.997
118 The 25 percentile for waiting time in a doctor's office is 10 minutes. The 75 percentile is 30 minutes. Which is incorrect
th th
Add 1.5 times the interquartile range to the third quartile to get the upper inner fence. Add 3.0 times the interquartile range to
the third quartile to get the upper outer fence. An outlier is beyond the inner upper fence.
119 Five homes were recently sold in Oxnard Acres. Four of the homes sold for $400,000, while the fifth home sold for $2.5 million.
. Which measure of central tendency best represents a typical home price in Oxnard Acres?
A. The mean or
median.
B. The median or
mode.
Summary:
5 homes
120 In Tokyo, construction workers earn an average of ×420,000 (yen) per month with a standard deviation of ×20,000, while in
. Hamburg, Germany, construction workers earn an average of €3,200 (euros) per month with a standard deviation of €57. Who
is earning relatively more, a worker making ×460,000 per month in Tokyo or one earning €3,300 per month in Hamburg?
Calculate and compare the z-score for each nation's workers. For Tokyo, z = (x - μ)/σ = (460000 - 420000)/(20000) = 2.00,
while for Hamburg, z = (x - μ)/σ = (3300 - 3200)/57 = 1.75. Therefore, wages for this worker are higher in Tokyo.
Summary:
a worker making ×460,000 per month in Tokyo (x = 460,000 yen) -> z = (460,000-420,000)/20,000 = 2
1 1
a worker making €3,300 per month in Hamburg ( x = 3,300) -> z = (3,300 – 3,200)/57 = 1.75
2 2
B. If the data are from a normal population, about 68 percent of the values will be within μ ±
σ.
Calculate the z-score to detect outliers: z = (x - μ)/σ = (81 - 52)/(15) = 1.93, which is not an outlier, while the CV is 100 × σ/μ =
100 × 128/640 = 20%.
B. Standard deviation
C. Midhinge
D. Interquartile range
123 If Q = 150 and Q = 250, the upper fences (inner and outer) are:
1 3
Add 1.5 times the interquartile range to the third quartile to get the upper inner fence. Add 3.0 times the interquartile range to
the third quartile to get the upper outer fence.
A. Figure A.
B. Figure B.
125. Which of the following statements is likely to apply to the incomes of 50 randomly chosen taxpayers in
California?
126 A certain health maintenance organization (HMO) examined the number of office visits by each of its members in the last year.
. For this data set, we would anticipate that the geometric mean would be
A. a reasonable measure of
center.
B. zero because some HMO members would not have an office visit.
Zeros would exist for those who had no office visits, so the geometric mean would be zero.
127 Three randomly chosen Colorado students were asked how many times they went rock climbing last month. Their replies were
. 5, 6, 7. The coefficient of variation is:
A. 16.7
percent.
B. 13.6 percent.
C. 20.0
percent.
D. 35.7
percent.
128. The mean of a population is 50 and the median is 40. Which histogram is most likely for samples from this
population?
A. Sample A.
B. Sample B.
C. Sample C.
We have tables that show the expected range of expected variation for a sample skewness coefficient for various sample sizes
from a symmetric, normal population.
We have tables that show the expected range of expected variation for a sample kurtosis coefficient for various sample sizes
from a normal population.
131 In Osaka, Japan, stock brokers earn ×6000 per hour on the average, with a standard deviation of ×1200. In Stuttgart,
. Germany, stock brokers earn an average of €18 per hour with a standard deviation of €6. In which country is the variation in
wages greatest?
Feedback: Osaka CV = 20 percent, Stuttgart CV = 33.3 percent, so variation is greater in Stuttgart. The point is to show that
you cannot assess relative variation based solely on the standard deviation when the units of measurement differ. (You have to
look also at the mean.)
132 Find the coefficient of variation of these numbers: 14, 17, 17, 19, 26. Would the variability of those numbers be greater than,
. less than, or the same as the variability of 24, 27, 27, 29, 36? Defend your answer.
Relative variation is greater in the first sample.
Feedback: First sample: mean = 8.6, standard deviation = 4.5055, CV = 24.25 percent. Second sample: mean = 28.6, standard
deviation = 4.5055, CV = 15.75 percent. The standard deviations are the same, but the relative variation is greater in the first
sample because the mean is smaller.
133 Ten randomly chosen students at a certain university were asked how many times they smoked marijuana during the
. preceding week. Their answers were 0, 8, 0, 0, 2, 4, 0, 0, 6, 0. A campus newspaper article appeared, with the headline
"Average Student Uses No Pot." Is this a fair assessment of central tendency? Discuss the alternatives.
Mode and median are 0, but the mean is 2. Geometric mean is zero due to zeros.
Feedback: Mode and median are 0, but the mean is 2. It is correct that 6 out of 10 students used no marijuana, but to say that
the "average" is zero ignores the four users who bring up the mean. The geometric mean is useless since it is zero whenever
the data set contains zero.
134 Twelve students were asked how many credit cards they owned. The responses were 0, 0, 1, 2, 2, 3, 3, 4, 4, 5, 5, 11. (a) Find
. the mean, median, and mode. (b) Which measure of center seems best in this case? (c) Find the first and third quartiles. What
do they tell you?
(a) Mean is 3.33, median is 3, mode is not unique; (b) The mean is slightly influenced by the highest data value, but is not
greatly different than the median. (c) Quartiles depend on which method is used (e.g., Minitab gives 1.25 and 4.75).
Feedback: Mean is 3.33, median is 3. The mode is useless because 0, 2, 3, 4, and 5 each occur twice. In this case the mean
or median gives a reasonable indication of what is "typical." Using the method of medians, Q = 1.5 and Q = 4.5. The method
1 3
of medians only requires sorting the data, finding the median, and then finding the median of the observations below the
median and the median of the observations above the median. Excel and Minitab may use different methods of calculating
quartiles. Excel's =QUARTILE.INC would give 1.75 and 4.25; Minitab would give 1.25 and 4.75, while Excel's
=QUARTILE.EXC will agree with Minitab.
Feedback: Mean is 2.364, median is 2, mode is 2. Any of these conveys a reasonable idea of the "typical" student. The median
is representative of the data, but a good case can also be made for the mode (5 of 10 students had 2 siblings). There are no
outliers, so the mean is not badly distorted (but 7 are below it and 4 above it). Only the mean reflects the fact that an "average"
family has more than two children. The geometric mean is unhelpful because of the zero in the data set.
136 Patient waiting times in the Tardis Orthopedic Clinic have a mean of 50 minutes with a standard deviation of 25 minutes. Within
. what range would approximately 95 percent of the waiting times lie if we were sampling a normal distribution? Do you think the
distribution is likely to be normal? Explain.
By the Empirical Rule, range is 0 to 100 minutes, but waiting times may be skewed by a few long waits (nonnormal).
Feedback: By the Empirical Rule, 50 ± (2)(25) gives a range of 0 to 100 minutes. However, the E.R. assumes normality, which
is unlikely for waiting times (probably right-skewed by a few unusually long waits). The large standard deviation likely is due to
outliers.
137 The athletic departments at 10 randomly selected U.S. universities were asked by the Equal Employment Opportunity
. Commission to state what percentage of their nursing scholarships were presently held by women. The responses were 5, 4,
2, 1, 1, 2, 10, 5, 5, 5. Find the mean, median, mode, and geometric mean. Which is the most appropriate measure of central
tendency? The least appropriate? Explain your answer. Is there an outlier?
Mean is 4, median is 4.5, mode is 5, geometric mean is 3.1623. The boxplot shows that 10 is an outlier but not an extreme
outlier (based on the fences criterion for outliers).
Feedback: Mean is 4, median is 4.5, mode is 5, geometric mean is 3.1623. For this data set, an argument can be made for
each of these measures of central tendency. The mean or median would probably be most "typical," although the mode does
represent 4 of the 10 observations. The geometric mean downplays the outlier (10) but is not really "typical" of any university.
The boxplot shows that 10 is an outlier but not an extreme outlier (based on the fences criterion for outliers).
AACSB: Reflective Thinking
Blooms: Evaluate
Difficulty: 2 Medium
Learning Objective: 04-02 Calculate and interpret common measures of
center.
Topic: Measures of Center
138 A survey of 10 randomly chosen drivers showed the following number of persons per car, including the driver: 1, 5, 1, 5, 2, 1, 1,
. 1, 2, 1. Describe the center, variability, and skewness for this sample.
Feedback: Mean is 2, median is 1, mode is 1. For this sample, the mode (6 of 10) most clearly characterizes the "typical" car
occupancy, which is also true of the median. However, only the mean would indicate that more than one person is actually
traveling, on average. The geometric mean is 1.585, which is not especially helpful but does downplay the two 5's. Data are
right-skewed.
139 A national survey showed that most commuter cars contain only the driver. Hungry for a story, a campus newspaper reporter
. asked five randomly chosen commuter students how many occupants, including the driver, rode to school in their cars. Their
responses were 1, 1, 1, 1, and 6. The next day a story appeared in the paper headlined "University Commuters Double
National Average Ridership." Is this a reasonable assessment of central tendency? How would you characterize the variability
of the sample?
The mean is 2, median is 1, and mode is 1. Coefficient of variation (112 percent) indicates high dispersion (standard deviation
exceeds the mean).
Feedback: The mean is 2, median is 1, and mode is 1. While technically correct, the paper's story is misleading since 80
percent of the cars contained only one occupant. Data are extremely right-skewed. The standard deviation is 2.236, so the
coefficient of variation (112 percent) indicates very high dispersion (standard deviation exceeds the mean).
140 A 10-point quiz was given by Professor Ennuyeaux. Of the 10 students in the class, half got zero and the others got perfect
. scores. List the students' scores. Then find the mean, median, mode, and geometric mean of their scores. Which is the most
appropriate measure of center? The least appropriate?
Mean is 5, median is 5, bimodal (0, 10), geometric mean is 0.
Feedback: 0, 0, 0, 0, 0, 10, 10, 10, 10, 10. Mean is 5, median is 5, bimodal (0, 10). Geometric mean is zero (useless due to
zeros in the data set). There is no "typical" or correct description of central tendency since there is no centrality in the data. In
such cases, stick with the mean and median but add a verbal caveat about the extremely bimodal nature of the data.
141 The owner of a chicken farm kept track of each hen's eating and egg production for many months, with the results below.
. Which has more variation, feed consumption or egg output?
Feed CV = 14.3 percent, egg CV = 25.0 percent. Egg production is more variable.
Feedback: Feed CV = 14.3 percent, egg CV = 25.0 percent. Egg production is more variable. Problem illustrates that when
units of measurement or means differ, you cannot use the standard deviation to compare variation.
142 Below are the ages of 21 CEOs. Find the mean, median, and mode. Are there any outliers? Explain.
. 46, 48, 49, 49, 50, 52, 54, 55, 57, 57, 58, 59, 60, 61, 62, 62, 63, 63, 65, 67, 75
Mean is 57.714, median is 58, four modes (49, 57, 62, 63). Standard deviation is s = 7.233. No outliers, but there is one
unusual data value at 75.
Feedback: Mean is 57.714, median is 58, four modes (49, 57, 62, 63). Standard deviation is s = 7.233. No outliers, but there is
one unusual data value at 75. Its standardized value is z = (75 - 57.714)/7.233 = 2.39. Using the method of medians, Q = 51, 1
143 Bob's sample of freshman GPAs showed a mean of 2.72 with a standard deviation of 0.31. (a) What range would you predict
. for all the grades? For the middle 95 percent? Explain. (b) Why might your estimates be inaccurate?
By the Empirical Rule, we expect the middle 95 percent between μ - 2σ and μ + 2σ (2.10 and 3.34) and all the GPAs between
μ - 3σ and μ + 3σ (1.79 and 3.65). The E.R. is based on the normal distribution, so could be inaccurate if grades are skewed.
Feedback: By the Empirical Rule, we expect the middle 95 percent between μ - 2σ and μ + 2σ (2.10 and 3.34) and all the
GPAs between μ - 3σ and μ + 3σ (1.79 and 3.65). The E.R. is based on the normal distribution, so could be inaccurate if
grades are skewed. If there is skewness, it is more likely to be to the left since many hard-working students will earn GPAs in
the range 3.00 to 4.00, while very few will be below 2.00 (but a few really poor performers could pull the mean down, since
GPA could even be 0.00).
144 A team of introductory statistics students went to a grocery store and recorded the total calories and fat calories for various
. kinds of soup. They produced a table of statistics and two dot plots. Write a succinct summary of the center, variability, and
shape for each data set. Note: TrimMean is the 5 percent trimmed mean (removing the smallest 5 percent and the largest 5
percent of the values, rounded to the nearest integer).
Both are right-skewed (mean > median) though not greatly so, judging from the dot plots. Trimmed mean is only slightly less
than the mean, suggesting that we don't have too many extreme values. However, on the Calories dot plot there is one outlier
because z = (180 - 96.63)/26.91 = 3.10.
Feedback: Both are right-skewed (mean > median) though not greatly so, judging from the dot plots. In each case, the trimmed
mean is only slightly less than the mean, suggesting that we don't have too many extreme values. However, on the Calories
dot plot there is one extreme value, which turns out to be an outlier since its standardized score is z = (180 - 96.63)/26.91 =
3.10. Better students will notice more details and aspects of the data and discuss them.
145 Here are descriptive statistics from Excel for annual per-pupil expenditures in 94 Ohio cities and home sizes in a certain
. neighborhood. Very briefly compare the variability and shape of the two data sets.
Expenditure per pupil is right-skewed (mean > median), skewness coefficient is also high; home size is practically symmetric
(mean ≅ median) and has skewness near zero. Expenditure per pupil has at least one severe outlier z = 7.76, while home size
has no outliers but one unusual value at z = 2.71.
Feedback: Expenditure per pupil is right-skewed (mean > median), and the skewness coefficient is also high. Home size is
practically symmetric (mean ≅ median) and has skewness near zero, though many students will say it's right-skewed. (It is
important to realize that skewness is a matter of degree, not a "yes-no" decision.) The modes are unhelpful since both data
sets are continuous measurements. The CVs indicate that expenditure per pupil has much greater dispersion (40.2 percent)
than home size (11.2 percent). Expenditure per pupil has at least one severe outlier at z = (11,226 - 2724.61/1095.22) = 7.76,
while home size has no outliers but one possibly unusual value at z = (2908 - 2231.41/249.32) = 2.71. Better student answers
will notice and discuss more of the data features, perhaps attempting to draw a histogram.
146 Below are shown a dot plot and summary statistics for a random sample of 34 shower heads. The measurements are
. maximum flow rates (in gallons per minute) at pressure of 80 pounds per square inch. Use the data to illustrate the difference
between the two alternative definitions of "outlier," and make any other comments you feel are relevant. Note: TrimMean
removes the smallest 5 percent and the largest 5 percent of the values.
Upper inner fence is 3.5, upper outer fence is 4.1, so by these definitions, three (maybe four) data points are "unusual" (above
the upper inner fence) and three are outliers (beyond the upper outer fence).
Feedback: Requires definitions of fences. The upper inner fence is Q + 1.5(Q - Q ) = 2.9 + 1.5(2.9 - 2.5) = 3.5, while the upper
3 3 1
outer fence is Q + 3.0(Q - Q ) = 2.9 + 3.0(2.9 - 2.5) = 4.1. By these definitions, three (maybe four) data points are "unusual"
3 3 1
(above the upper inner fence) and three are outliers (beyond the upper outer fence). Using the standardized variable definition,
the cutoff for an "unusual" data point is = 2.882 + 2(0.750) = 4.382 (which includes 3 data points), while the cutoff for an
"outlier" is = 2.882 + 3(0.750) = 5.132 (which includes 1 data point). Therefore, the definitions generally agree on what is
"unusual" but not on what constitutes an "outlier."
147 Briefly describe these data. Sketch its box plot and describe the sample succinctly.
.
Skewed right (mean > median), at least one outlier at z = 3.22, box plot will be skewed right and asymmetric.
Feedback: Skewed right (mean > median) as reflected also in the trimmed mean (below the mean). There is at least one
outlier, whose standardized score is z = (49 - 12.89)/11.23 = 3.22. Box plot will be skewed right (long right whisker) and has
asymmetric "box" whose upper half (Q to Q ) is wider than its lower half (Q to Q ). The picture is that in most Rose Bowl
2 3 1 2
games, the winning margin tends to be small, but in a few games there was a "blowout" that raises the mean. Astute students
may notice the 0 and ask how the winning margin can be zero. (In 1922, Washington and Jefferson played California to a
scoreless tie, this being before the "sudden death" overtime had been established.)
AACSB: Reflective Thinking
Blooms: Evaluate
Difficulty: 2 Medium
Learning Objective: 04-08 Make and interpret box
plots.
Topic: Percentiles, Quartiles, and Box Plots
148 Craig operates a part-time snow-plowing business using a 2002 GMC 2500 HD extended cab short box truck. Describe Craig's
. gasoline mileage based on this histogram of 195 tanks of gas.
Fairly symmetric, yet a few high values will draw up the mean.
Feedback: Fairly symmetric. A few high values exist (they could be outliers, but we would need standard deviation or quartiles
to say for sure). Astute students could apply the Empirical Rule to estimate σ = (X - X )/6, or σ = (X - X )/4 and try to check
Max Min Max Min
for outliers, but this would not be expected. Some will suggest that the data are normal but there were data measurements
(e.g., three tanks erred on the high side, one on the low side).
149 Craig operates a part-time snow-plowing business using a 2002 GMC 2500 HD extended cab short box truck. Describe Craig's
. gasoline mileage based on this box plot of 195 tanks of gas.
Range is from just under 9.0 to just over 21.0; typical gas mileage is concentrated near 13 mpg, with the middle 50 percent
between about 12.5 and 13.5 (middle of the "box"); two unusual data values on low end and three on high end (beyond inner
fences).
Feedback: Range is from just under 9.0 to just over 21.0. Typical gas mileage is concentrated near 13 mpg, with the middle 50
percent between about 12.5 and 13.5 (middle of the "box"). Symmetric except for one data point in right tail. Two unusual data
values on low end and three on high end (beyond inner fences). On the high end, two are outliers (beyond outer fence).
Requires knowing definitions of fences.
150 Here are advertised prices of 21 used Chevy Blazers. Describe the distribution (center, variability, shape).
.
Range is from 7,000 to almost 18,000; median is around 11,500; interquartile range is about 11,000 to 14,000, with right-
skewness.
Feedback: Range is from 7,000 to almost 18,000. Median is around 11,500 with interquartile range about 11,000 to 14,000.
Right-skewed, based on the extremely asymmetric box, but whiskers are roughly symmetric. Mean would probably be well
above the median, based on skewness. Requires knowing how to read quartiles from a box plot.
151 Briefly describe this sample of departure delays on American Airlines flights out of Denver over a seven-day period, March 3-9
. (n = 149 flights).
Short left whisker, skewed right. Most data are packed into a very narrow range, but there are 14 outliers (above the upper
fence) and 3 or 4 more above the inner fence.
Feedback: An early departure ("pushback after doors closed") can occur once a plane is fully loaded. In this data set, flights
departed up to 6 minutes early. The short left whisker and narrow box show that most data values are packed into a very
narrow range. The quartiles Q , Q , and Q are near -5 (i.e., flights typically push back about 5 minutes early). Only 9 flights
1 2 3
departed more than 20 minutes late. There are 14 outliers (above the upper outer fence) and 3 or 4 more above the upper
inner fence. Data are extremely right-skewed. Factors such as weather can cause long departure delays, but for most flights
an early or on-time departure is the norm.
152 Six graduates from Fulsome University's Master's of Waste Management program were hired by a Saudi Arabian firm at
. $110,000 each, while the other four graduates were unemployed. The university placement office bragged, "Our MWM
graduates enjoyed a median starting salary of $110,000." Is this a reasonable assessment of central tendency? What are the
alternatives?
Can't use geometric mean due to zeros, but none of the measures is typical of anyone.
Feedback: The median and mode are 110, but the mean is only 66. We can't use geometric mean due to zeros. Sample is
small, so no measure is very reliable, but an honest placement service would note that 40 percent of the graduates were
unemployed and that the salary was only for those who actually found jobs.