Statistics Two Workbook V2
Statistics Two Workbook V2
Peter Dalley
1
Table of Contents
Module 9: Hypothesis Testing – Single Population Mean and Proportion................................ 3
Module 10: Hypothesis Testing – Two population means and proportions ........................... 20
Module 11: Hypothesis Testing – Variances .......................................................................... 36
Module 12: Multiple Proportions, Independence and Goodness of Fit .................................. 46
Module 13: Analysis of Variance ........................................................................................... 57
Module 14: Simple Linear Regression ................................................................................... 68
Module 15: Multiple Regression ........................................................................................... 72
2
Module 9: Hypothesis
Testing – Single
Population Mean and
Proportion
3
9-1A Single population mean, one-tail test, σ known
A local craft brewery claims the amount of beer in its bottles is 12oz (340ml). It knows that
making false claims on its labels would result in serious penalties if it overstated the true volume.
Every Monday morning, a sample of 25 bottles is taken to test the accuracy of their filling
machines. Over the past few years of weekly sampling, they have calculated the standard
deviation of the population to be σ = 2.1oz. This week, the sample resulted in a mean filling
volume of x = 11.4oz. Are they at risk of facing any penalties? Use α = 0.05.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
4
9-1B Single population mean, one tail test, σ known
An instructor in the TRU School of Business and Economics has produced a series of problems
and accompanying video walkthroughs in hopes of improving his statistics students’
understanding of course content. Having taught the course many times, he determines the
historical average to be 76% and the population standard deviation is σ = 17.3. At the end of the
following semester, his class of 45 students, who had access to the video walkthroughs, had an
average grade of x = 81.3%. Using a level of significance of α = 0.05, test to determine
whether this shows an improvement over the historical average.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
5
9-1C Single population mean, one tail test, σ known
Assume car salespeople historically sell an average of 96 cars per year. In an effort to increase
sales, James Barnett, the regional manager, proposed a new commission structure in hopes of
increasing the incentives for salespeople to sell more cars. After the first year of the new
commission structure, a sample of 26 salespeople from within his region had average car sales of
𝑥̅ = 98. Assume the population variance is known to be σ2 = 49. Do we have evidence to show
that the new commission structure has increased the average number of cars sold? Perform the
test at α = 0.05.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
6
9-1D Single population mean, two-tail test, σ known
A water bottle manufacturing facility produces bottles that are designed to hold 24oz (750ml).
The bottles are produced by pouring liquid plastic into a mold. Once the plastic hardens, it is
removed from the mold, polished, and has labels printed on it to mark various volumes: 8oz,
16oz and 24oz. The company’s quality assurance team periodically takes a sample of 30 bottles
and fills them to the 24oz line with water. The water from each bottle is then measured to
determine the actual volume of water contained in the bottle when filled to the 24oz line. Over
the years, they have determined the population standard deviation of filling volume to be σ = 1.4.
The most recent sample had an average volume of water at the 24oz line of 23.6oz. Use a 10%
level of significance to test the accuracy of the labels.
a) Formulate a test to determine if the 24oz label on the bottle is accurate. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusions.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) Verify your findings using the confidence interval approach.
7
9-1E Single population mean, two-tail test, σ known
A local farmer produces hay for nearby cattle ranchers. The hay is rolled into 50lb (22Kg) bails
and are sold by quantity. Therefore, when a rancher buys 100 bales of hay, they can expect to
receive 5,000lbs of hay. To ensure the cattle ranchers are getting what they expect, the farmer
periodically samples batches of 40 hay bales to test that they are averaging 50lbs. The most
recent batch provided a sample weight of x = 51.2lbs. The population standard deviation is
known to be σ = 3.73. Using a level of significance of α = 0.05 test to determine whether the
farmer is producing what the ranchers are expecting.
a) Formulate the null and alternative hypotheses. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Confirm your results with a confidence interval
f) Interpret your results
8
9-2A Probability of Type Two Error – One Tailed Test
A local craft brewery claims the amount of beer in its bottles is 12oz (340ml). It knows that
making false claims on its labels would result in serious penalties if it overstated the true volume.
Every Monday morning, a sample of 25 bottles is taken to test the accuracy of their filling
machines. Over the past few years of weekly sampling, they have calculated the standard
deviation of the population to be σ = 2.1oz. This week, the sample resulted in a mean filling
volume of x = 11.4. Are they at risk of facing any penalties? Use α = 0.05.
Formulate the null and alternative hypotheses. Justify your formulation.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) If the actual population mean is µa = 11oz, what is the probability of committing a Type II error?
Interpret this value. What if it is µa = 11.5oz? What if it is µa = 12.5oz,
g) If the manager states that she is willing to risk a β = 0.01 probability of not rejecting the null if
the average volume within 1oz of specification, how large should the sample size be?
9
9-2B Probability of Type Two Error– One Tailed Test
Assume car salespeople historically sell an average of 96 cars per year. In an effort to increase
sales, James Barnett, the regional manager, proposed a new commission structure in hopes of
increasing the incentives for salespeople to sell more cars. After the first year of the new
commission structure, a sample of 26 salespeople from within his region show and average car
𝑥̅ = 98. Assume the population variance is known to be σ2 = 49. Do we have evidence to show
that the new commission structure has increased the average number of cars sold? Perform the
test at the α = 0.05.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) If the actual population mean is µa = 100 cars, what is the probability of committing a Type II
error? Interpret this value. What if it is µa = 99? What if it is µa = 95?,
h) If the regional manager states that he is willing to risk a β = 0.10 probability of not rejecting the
null if the average number of cars sold is 99, how large should the sample size be?
10
9-2C Probability of Type Two Error – Two Tailed Test
A water bottle manufacturing facility produces bottles that are designed to hold 24oz (750ml).
The bottles are produced by pouring liquid plastic into a mold. Once the plastic hardens, it is
removed from the mold, polished, and has labels printed on it to mark various volumes: 8oz,
16oz and 24oz. The company’s quality assurance team periodically takes a sample of 30 bottles
and fills them to the 24oz line with water. The water from each bottle is then measured to
determine the actual volume of water contained in the bottle when filled to the 24oz line. Over
the years, they have determined the population standard deviation of filling volume to be σ = 1.4.
The most recent sample had an average volume of water at the 24oz line of 23.6oz. Use a 10%
level of significance to test the accuracy of the labels.
a) Formulate a test to determine if the 24oz label on the bottle is accurate. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusions.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) Verify your findings using the confidence interval approach.
g) If the actual population mean is µa = 23.2oz, what is the probability of committing a Type II
error? Interpret this value.
h) If the manager states that she is willing to risk a β = 0.10 probability of not rejecting the null if
the average volume within 0.5oz of specification, how large should the sample size be?
11
9-3A Single population mean, one-tail test, σ unknown
Red light cameras are often an effective deterrent to reducing the number of people who run red
lights at intersections. However, they can be expensive to install and maintain. For these
reasons, they are only installed at intersections where there tends to be the most accidents caused
by drivers running red lights. Let’s assume the threshold number of accidents is 12 per year; any
more than this warrants a camera. At one location, researchers found the average number of
accidents over a ten-year period was 12.6. Based on this data, they immediately submitted a
recommendation to install a red light camera at this intersection. After all, 12.6 is greater than
12.
a) Was this the correct decision?
b) What is the proper approach to making this decision? Show each step. Use α = 0.05.
(Hint: Assume you have access to their data and were able to calculate the sample standard
deviation to be s = 1.35)
12
9-3B Single population mean, one-tail test, σ unknown
13
9-3C Single population mean, one-tail test, σ unknown
A local fire department has a goal of responding to house fires in under 14 minutes. In order to
determine whether they are achieving their goal, samples of 20 response times are tested every
week. The most recent sample resulted in a mean response time of x = 13.2 minutes with a
sample standard deviation of s = 2.3. Use 𝛼 = 0.05.
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and discuss your results. Show all your work.
c) What would it mean to commit a Type I or Type II error in this scenario?
14
9-3D Single population mean, two-tail test, σ unknown
It is common among some universities to target a specific average grade in certain courses. At
the end of each semester, instructors faced with this constraint are required to determine whether
their class average is statistically different from the targeted average for the course. If they are
either above or below the target, adjustments must be made. Let us assume that in a particular
course, instructors are expected to have an average grade of 70%. This semester, the class
average was 73% with 44 students. The sample standard deviation is s = 0.13, or 13 percentage
points. Use α = 0.05.
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and discuss your results. Show all your work.
c) Produce a 95% confidence interval estimate consistent with this test.
15
9-4A Single population proportion, one-tail test
Election years always bring us reports full of statistics on who’s winning in the latest polls. One
recent pollster argued that the conservative candidate has support from more than half of
registered voters. After digging a little deeper into the article, you find more details on the
pollster’s findings. In a footnote at the bottom of the page, you find the following information:
Out of a sample of 175 registered voters, 95 (or 54.29%) stated that they support the conservative
candidate.
a) On what information is the pollster’s statement based? Is this fair and accurate?
b) Formulate the appropriate null and alternative hypothesis for a proper test.
c) Calculate your test statistic, p-value and conclusion.
d) State and interpret your conclusion.
16
9-4B Single population proportion, one-tail test
According to the Institution of Consumer Goods Waste, 60% of products that are returned to a
store for refund, are not in resaleable condition and are simply thrown away. A local
department store sampled 90 returns from the previous month and discovered 46 of them were in
such condition and had to be thrown away. The manager is hopeful that her department store
does a better job at only accepting returns and issuing refunds are goods that can be resold.
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and discuss your results. Show all your work.
c) Interpret your results.
17
9-4C Single population proportion, two-tail test
It has been argued that drunk drivers cause 50 percent of fatal accidents on the nation’s
highways. In order to test this claim, you obtain the following information: out of the last 120
accidents in your state, you find that 72 of them were caused by drunk driving.
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and p-value.
c) State your conclusion and interpret its meaning.
18
9-4D Single population proportion, two-tail test
While at a dinner party and having a lively debate with an old uncle about labour force
unionization, he claims that labour markets in the US are more competitive than in Canada. He
supports this claim by stating that in Canada roughly 30% of all workers are unionized, while in
the US its closer to 11%. You can’t believe unionization rates are so low in the US, so you do
some research. Suppose in one article you find that out of a sample of 225 workers, just 32 of
them are unionized. Do these numbers support your uncle’s claims about US unionization rates?
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and p-value.
c) State your conclusion and interpret its meaning.
19
Module 10: Hypothesis
Testing – Two population
means and proportions
20
10-1A Two population means, one-tail test, σ known
A friend of yours has claimed that Subaru owners are, on average, faster drivers than Mazda
owners. Although you may not disagree with him, being the young statistician that you are, you
decide to gather some data and perform a test. You set up a radar on the highway and begin
collecting data. After one week, you’ve found the average speed of 47 Mazdas was 63.3mph and
the average speed of 51 Subarus was 65.7mph. Assume that we know the population variance of
the Mazda to be 𝜎!" = 26 and the Subaru to be 𝜎#" = 29.
a) Formulate the null and alternative hypothesis. Use 5% level of significance. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
21
10-1B Two population means, One tail test, σ known
Many students tend to choose courses based on which professor gives the easiest marks. One
student seems very certain that Prof. Dalley is much easier than Prof. Fraser. In order to test this
claim, you talk with other students who have taken courses with each of them. After talking with
41 of Prof. Dalley’s past students, you find the average grade was 73.2%. You talk with 49 of
Prof. Fraser’s past students and calculate their average grade to be 66.4%. Assume that we know
the population standard deviation of Prof. Dalley grades to be σD= 0.14 and Prof. Fraser’s grades
to be σF = 0.18.
a) Formulate a hypothesis to test to test the student’s claim. Use 3% level of significance. Justify
your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
22
10-1C Two population means, one tail test, σ known
Suppose your cousin, who just recently discovered you’re taking a statistics course, thought
they’d be a nuisance, just for the fun of it. They claim, without any hesitation, that the average
weight of a white onion is at least 50 grams more than the average weight of an ambrosia apple.
And of course, they ask you to prove them wrong. As the mature student that you are, you
decide to turn this into a teachable moment for your cousin. You both go to the grocery store and
select a random sample of white onions and apples. The following table contains your sample
data:
White Onions Ambrosia Apples
Count 28 25
Average Weight (grams) 300 256
Population Std. deviation 9.5 7
23
10-1D Two population means, Two tail test, σ known
Anybody who has siblings knows how competitive they can become. Imagine, two brothers
playing catch with a baseball. It may start of as innocent play, but it won’t be long before it
becomes competitive with one bragging about their ability to throw further than the other. In
order to settle the argument, Dad comes out to take measurements. After each brother throws the
ball 50 times, Dad calculates Peter, had an average distance of 44 feet and David had an average
distance of 46 feet. As one would expect, David begins bragging as soon as he hears this news.
Dad suggests that on average, the two are throwing the ball equal distance. Assume that we know
the population standard deviation of Peter’s throws to be σP = 7.8 and David’s to be σD = 8.2.
a) Formulate a hypothesis to test to test the Dad’s claim. Use α=0.05. Use Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) Confirm your findings with a confidence interval estimate.
24
10-1E Two population means, Two tail test, σ known
Imagine the following results come from two independent random samples taken from two
populations.
Sample A Sample B
Count 35 30
Mean (lbs) 113 105
Population Std deviation 5.1 5.4
a) Formulate a hypothesis to test to determine that the difference between the two means is 5
pounds. Use α = 0.03.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
e) Confirm your results using a confidence interval estimate.
25
10-2A Two population means, One tail test, σ unknown but assumed equal
A friend once told you that Golden Retrievers are a much faster breed of dog than Border Collie.
As a dog lover, you become interested in determining whether or not the data would support
such a claim. Assume you manage to gather 29 Golden Retrievers and 31 Border Collie for a
massive 100-meter dog race! After the race, you gather all their times. You find the average
time for the Golden Retriever to 7.2 seconds and for the Border Collies, 8.1 seconds. You
calculate the sample standard deviations to be s = 1.83 and s = 1.57 seconds for the Retrievers
and Border Collies, respectively. Use α = 0.05.
a) Formulate a hypothesis to test your friend’s claim. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
26
10-2B Two population means, One tail test, σ unknown but assumed equal
(This question is intentionally similar to the previous one)
A friend once told you that Golden Retrievers are a much faster breed of dog than Border Collie.
As a dog lover, you become interested in determining whether or not the data would support
such a claim. Assume you manage to gather 29 Golden Retrievers and 31 Border Collie for a
massive 100-meter dog race! After the race, you gather all their times and calculate their speeds.
You find the average speed for the Golden Retriever to 50Km/h and for the Border Collies,
44.4Km/h. You calculate the sample standard deviations to be s = 12.71 Km/h and s = 8.62
Km/h for the Retrievers and Border Collies, respectively. Use α = 0.05.
a) Formulate a hypothesis to test your friend’s claim. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
27
10-2C Two population means, Two tail test, σ unknown but assumed equal
University classes are becoming increasingly diverse, with students moving from all parts of the
planet to study in difference countries. Imagine your statistics instructor gives you the following
assignment:
Measure the heights of the students in your classes and sort them by continent of origin. Perform
a hypothesis test to determine if the average height of students from North America is different
than the average height of students from Europe.
As the good student that you are, you awkwardly go around asking all of your classmates in your
classes how tall they are. You obtain the following data:
a) Formulate the appropriate hypothesis test. Use α = 0.05. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
28
10-2D Two population means, Two tail test, σ unknown but assumed unequal.
The TRU School of Business and Economics recently did a survey of alumni salaries from the
past 5 years. The initial survey included only students in the finance major and the economics
major. Test for any difference between the two using a 10% level of significance. (See Problem
13-1C to see how we compare 3 or more majors)
Economics Finance
Count 54 63
Mean ($) 96,480 94,315
Standard deviation 6340 6001
29
10-3A Two population means, One tail test, Matched Sample
As a bilingual country, there are benefits to speaking both official languages in Canada. This is
especially true for employees of the Canadian Public Service. In order to promote bilingualism,
public servants have the opportunity to take courses in their second language at no cost to
themselves. As a taxpayer funded training program, it’s important to verify that it is effective.
In order to test this, students are given an entry exam when they begin language training and an
exit exam once they are finished. The difference between their grades on these two exams are
used to determine if they have improved in their language proficiency. The following data is
from the most recent data collection:
a) Formulate the appropriate hypothesis test using a 5% level of significance. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
30
10-3B Two population means, One tail test, Matched Sample
My fiancée is currently in the market for a new single serving blender for making healthy
smoothies. One manufacturer produces two different models branded as basic consumer and
professional athlete. According to the company website, the suggested retail price of the
professional athlete model is $20 more than the basic consumer model. You gather the following
sample of prices from various retailers:
a) Formulate the appropriate hypothesis test to determine if the price difference between the two
is no more than $20. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
31
10-3C Two population means, Two-tail test, Matched Sample
Retail gasoline outlets frequently advertise the benefits of their fuel additives in maintaining a
clean and smooth-running engine. However, there is some disagreement on whether or not it
affects fuel efficiency. In order to determine if the additive affect fuel efficiency, one chain of
gas stations measured fuel efficiency of 5 cars without the additive, then again with the additive.
The following table contains the data they collected:
a) Formulate the appropriate hypothesis test at the 0.05 level of significance. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
32
10-4A Two population proportion, One tail test
Tourism plays an important role in the local economies of many small towns. Because tourism
can be difficult industry to define, due to the broad spectrum of activities involved, hotel
occupancy rates are frequently used as a rough measure of the industry’s overall performance.
The following table provides hotel occupancy data for your hometown, during the month of
August for two consecutive years.
This Year Last Year
Occupied Rooms 1535 1332
Total Rooms 1840 1650
33
10-4B Two population proportion, One tail test
In problem 9-4B, we learned that according to the Institution of Consumer Goods Waste, 60% of
products that are returned to a store for refund, are not in resaleable condition and are simply
thrown away. A local department store sampled 90 returns from the previous month and
discovered 46 of them were in such condition and had to be thrown away. In problem 9-4B, we
found that this represented a statistically significant difference; meaning the store was successful
at throwing away a smaller proportion of their refunded products. Since then, the manager has
set a goal to reduce this portion every month. This month, from a sample of 105 returns, they
found that 45 had to be thrown away. Were they successful at reducing the proportion of returns
that had to be thrown away?
a) Formulate a test to determine if the proportion of returns that had to be thrown away
decreased.
b) Calculate the test statistic and corresponding p-value
c) Draw your conclusion at the α=0.05 level of significance.
d) Interpret your conclusion.
34
10-4C Two population proportion, two tail test
Many employers are beginning to recognize the importance of maintaining a happy and healthy
workforce. Some employers even provide their workers with arcades, child-care facilities,
libraries and all kinds of sporting activities such as tennis courts, pickle ball courts and more.
However, some people argue that these resources are not enjoyed equally by men and women.
Therefore, it is important to ensure that the resources being offered are benefitting both genders
equally. In order to ensure this to be true, one employer periodically samples its employees and
asks whether or not they are satisfied with their current working environment. The following
table provides the data collected.
Women Men
Satisfied respondents 51 48
Total number of employees 59 63
a) Formulate a test to determine both genders are equally satisfied with their workplace
environment.
b) Draw your conclusion at the α=0.05 level of significance.
c) Interpret your conclusion.
35
Module 11: Hypothesis
Testing – Variances
36
11-1A Hypothesis Testing: Single population variance – One tailed test.
A local craft brewery claims the amount of beer in its bottles is 12oz (340ml). It knows that
making false claims on its labels would result in serious penalties if it overstated the true volume.
Every Monday morning, a sample of 25 bottles is taken to test the accuracy of their filling
machines. Over the past few years of weekly sampling, they have calculated the standard
deviation of the population to be σ = 2.1oz. They would like this to be a maximum level of
variation in their filing process so, periodically, this is tested as well. This week, the sample
resulted in a mean filling volume of x = 11.4oz with a sample standard deviation of s = 2.55.
Perform a test to determine if they are exceeding their maximum desired variance. Use α =
0.05.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
37
11-1B Hypothesis Testing: Single population variance – One tailed test.
A local farmer produces hay for nearby cattle ranchers. The hay is rolled into 50lb (22Kg) bails
and are sold by quantity. Therefore, when a rancher buys 100 bales of hay, they can expect to
receive 5,000lbs of hay. To ensure the cattle ranchers are getting what they expect, the farmer
periodically samples batches of 40 hay bales to test that they are averaging 50lbs. It is important
that the bails be relatively consistent in size as well. The historical population standard deviation
is known to be σ = 3.73 but this is considered to be too high. In response to this, an effort has
been made to improve quality controls and reduce the variation in weights. The most recent
batch provided a sample weight of x = 51.2lbs and a standard deviation was found to be 2.99.
Using a level of significance of α = 0.05 test to determine whether the variance in the size of hay
bales has been successfully reduced.
38
11-1C Hypothesis Testing: Single population variance – two tailed test.
A water bottle manufacturing facility produces bottles that are designed to hold 24oz (750ml).
The bottles are produced by pouring liquid plastic into a mold. Once the plastic hardens, it is
removed from the mold, polished, and has labels printed on it to mark various volumes: 8oz,
16oz and 24oz. The company’s quality assurance team periodically takes a sample of 30 bottles
and fills them to the 24oz line with water. The water from each bottle is then measured to
determine the actual volume of water contained in the bottle when filled to the 24oz line. Over
the years, they have determined the population standard deviation of filling volume to be σ = 1.4.
This has been the industry standard for the variation in filling volumes for decades and it is the
goal of the facility to meet this standard. The most recent sample had an average volume of water
at the 24oz line of 23.6oz and a standard deviation of s = 1.82. Use a 1% level of significance.
a) Formulate the null and alternative hypotheses. Justify your formulation.
b) Calculate the test statistics for both tests.
c) Use the p-value approach to draw your conclusions.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
39
11-1D Hypothesis Testing: Single population variance – Two tailed test.
An instructor in the TRU School of Business and Economics has produced a series of problems
and accompanying video walkthroughs in hopes of improving his statistics students’
understanding of course content. Having taught the course many times, he determines the
historical average to be 76% and he the population standard deviation is σ = 17.3. At the end of
the following semester his class of 45 students, who had access to the video walkthroughs, had
an average grade of x = 81.3% with a standard deviation of s = 13.6. Using a level of
significance of α = 0.05, test to determine whether those with access to the videos have the same
variance in grades as the historical population.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) Develop a 95% confidence interval for the population standard deviation.
40
11-2A Hypothesis Testing: Two population variance – One tailed test.
One of the challenges in writing in-class exams is to ensure that they can be completed within
the allotted time. Even though the average completion time is within requirements, the variance
can often be a problem. Some students might finish in ten minutes, while others run out of time.
In an attempt to reduce the variance of completion times, a new computerized method of testing
has been implemented. In order to determine if the new method was successful at reducing the
variance of completion times, a sample 30 students were asked to write the exam using the old
method and a sample of 35 students wrote the exam using the new method. Those using the old
method finished in 78 minutes, on average, with a with a standard deviation in completion time
of s = 10.3 minutes. Their average grade was 72.3% with standard deviation s = 13. Those
using the new method finished in 75 minutes, on average, with a standard deviation of s = 7.5
minutes. Their average grade was 67.2% with standard deviation 14.
a) Formulate a test to determine if the new method succeeded at reducing the variance in
completion times.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
One of your colleagues has suggested that although the new method may succeed at reducing the
variance in completion times, it comes at a cost to student performance. Test this claim.
a) State the appropriate test.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
41
11-2B Hypothesis Testing: Two population variance – One tailed test.
When the seasons change, those living in colder climates need to switch the tires on their cars to
match the road conditions. Winter tires are made with a different rubber compound that allows
them to remain softer in cold temperatures to in order to maintain a better grip on cold and
slippery roads. Although average stopping distance is an important selling point, so too is the
variance in stopping distances. Suppose a consumer’s digest collects the following data on two
tires: the first is a popular winter tire that has been on the market for many years. The second, a
new tire advertised to have a new and improved rubber compound capable of stopping at least as
quickly but even more consistently.
a) Develop the appropriate tests to evaluate these claims. Use a 10% level of significance.
b) Draw your conclusion using the p-value and the critical value approaches.
c) Interpret your findings
42
11-2C Hypothesis Testing: Two population variance – Two tailed tests.
A friend once told you that Golden Retrievers are a much faster breed of dog than Border Collie.
As a dog lover, you become interested in determining whether or not the data would support
such a claim. Assume you manage to gather 29 Golden Retrievers and 31 Border Collie for a
massive 100-meter dog race! After the race, you gather all their times. You find the average
time for the Golden Retriever to 7.2 seconds and for the Border Collies, 8.1 seconds. You
calculate the sample standard deviations to be s = 1.83 and s = 1.57 seconds for the Retrievers
and Border Collies, respectively. Use α = 0.05.
a) Develop a test to determine if the assumption of equal variances was appropriate.
b) Calculate your test statistic
c) State your p-value, critical value and conclusion at the 0.05 level of significance.
d) Interpret your results.
43
11-2D Hypothesis Testing: Two population variance – Two tailed test.
University classes are becoming increasingly diverse, with students moving from all parts of the
planet to study in difference countries. Imagine your statistics instructor gives you the following
assignment:
Measure the heights of the students in your classes and sort them by continent of origin. Perform
a hypothesis test to determine if the average height of students from North America is different
than the average height of students from Europe.
As the good student that you are, you awkwardly go around asking all of your classmates in your
classes how tall they are. You obtain the following data:
European North American
Count 33 41
Mean (inches) 70 68
Standard deviation 5.3 3.7
44
11-2E Hypothesis Testing: Two population variance – Two tailed test.
The TRU School of Business and Economics recently did a survey of alumni salaries from the
past 5 years. The initial survey included only students in the finance major and the economics
major. Test for any difference between the two using a 10% level of significance.
Economics Finance
Count 54 63
Mean (inches) 96,480 94,315
Standard deviation 6340 6001
45
Module 12: Multiple
Proportions,
Independence and
Goodness of Fit
46
12-1A Testing Equality Across Multiple Population Proportions
As part of an undergraduate research project, you decide to determine whether pet owners are
satisfied with their choice of pet. In order to gather sample data, you develop a survey to ask
respondents what type of pet they have (Dog, Cat, Other) and whether they are likely to adopt
the same pet upon its death, or a different one. The following table contains the observed
frequencies:
Type of Pet
Cat Dog Other Total
Likely to Yes 48 51 44 143
readopt No 31 26 68 125
Total 79 77 112 268
47
12-1B Testing Equality Across Multiple Population Proportions
After the most recent election, you decide to determine if there was a difference in the proportion
of voters who changed their voting intentions at the last minute. Some voters choose early on in
the campaign who they will vote for and stick with it, while others may change their minds when
new information becomes available. This might shed some light on which voters are more
susceptible to information released close to election day. In order to gather data, you produce a
survey that asks each respondent which party they voted for and if this was a result of a change
of intentions within the 4 weeks prior to Election Day. The following table provides the
observed frequencies:
Political Party
Conservatives Liberals Other Total
Changed Yes 96 82 87 265
voting No 98 79 49 226
intention Total 194 161 136 491
a) Formulate the null and alternative hypothesis. Use a 10% level of significance.
b) Compute the Expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
e) If appropriate, use the Marascuilo procedure to determine where any differences exist.
48
12-1C Testing Equality Across Multiple Population Proportions
A local car dealership is interested in determining customer satisfaction and brand loyalty. They
sample owners of three different brands of vehicles and ask whether or not they are likely to buy
the same brand when they purchase a new car, or shop around. The following table contains the
observed frequencies:
Brand of vehicle
Ford GMC Chevy Total
Likely to Yes 104 79 110 293
repurchase No 72 89 90 251
Total 176 168 200 544
49
12-2A Tests of Independence
The local animal shelter is interested in knowing if people decision to adopt a pet or purchase
from a breeder is independent of the type of pet. Knowing which pets are more likely to be
adopted will help them manage their inventories. The following table provides the observed
frequencies:
Type of Pet
Cat Dog Other Total
Adopt 60 62 61 183
Purchase 38 71 65 174
Total 98 133 126 357
a) Formulate the null and alternative hypothesis. Use a 10% level of significance.
b) Compute the expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
50
12-2B Tests of Independence
Is political affiliation independent of gender? The following table provides the observed
frequencies from a recent survey of students at your college:
51
12-2C Tests of Independence
A local car dealership has reason to believe that whether a customer is married or single will
influence the brand of vehicle they choose to buy. The following table contains the observed
frequencies collected using a survey.
52
12-3A Goodness of Fit – Multinomial Distribution
Type of Pet
AquaBear WaterDog H2Osprey RainyCat Total
Historical
0.33 0.25 0.21 0.21 1
Market Share
Revenues
90 63 78 69 300
(Current Year)
a) Formulate the null and alternative hypothesis. Use a 10% level of significance
b) Compute the expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
53
12-3B Goodness of Fit – Multinomial Distribution
As part of a research project for your statistics class, you decide to gather data on your
classmates’ favourite candy to determine if there’s a statistically significant difference in their
preferences. You group their responses as follows:
54
12-4A Goodness of Fit – Normal Distribution
A simple random sample of 30 grades from a principles of microeconomics course are listed
below. They have been sorted from largest to smallest for convenience. The mean grade is 61,
with a standard deviation of 17.
32.65 61.18 74.43
44.85 61.61 75.91
45.89 64.37 80.77
46.82 66.95 82.56
48.35 67.09 84.90
52.37 67.13 85.90
55.57 68.73 91.06
60.45 70.19 92.63
60.74 71.08 93.29
61.15 72.80 101.20
55
12-4B Goodness of Fit – Normal Distribution
A local painting company that employs students is doing some analysis on the completion times
of its employees. One part of the analysis is to determine if the completion times are normally
distributed or not. The data below consists of the number of minutes it took each of 30
employees to finish painting a small bedroom. The average completion time was 74 minutes
with a standard deviation of 15. The data has been sorted from largest to smallest for
convenience.
54.93 64.86 79.31
57.49 65.75 80.00
57.88 66.85 83.77
58.03 67.01 89.45
58.42 67.45 91.38
60.21 67.75 91.47
60.33 69.47 92.21
60.50 72.63 94.22
60.67 75.08 102.10
62.69 77.93 104.41
56
Module 13: Analysis of
Variance
57
13-1A Single Factor Analysis of Variance (ANOVA) – Completely Randomized Design
WhiteTooth Inc. is developing an additive for its line of toothpastes that is designed to whiten
teeth in as little time as possible. It currently has two variations of the additive, Type A, and B,
but only wishes to produce and market one. In order determine the effectiveness of these new
additives, a focus group consisting of 28 people is organized. Nine are given type A, nine are
given type B and nine are the control group and are given a placebo. Each person is asked to use
the toothpaste and record the time of days it takes before their teeth achieve a predetermined
shade of white. The following table contains the data collected:
5 4 9
5 5 7
7 6 8
4 5 9
6 8 7
6 7 7
8 4 8
7 6 5
6 5 6
Mean 6 5.56 7.33
Variance 1.5 1.78 1.75
a) Test to determine whether or not there is a difference between the two types of additives and
the control group.
b) Perform a Fisher’s LSD test if necessary.
58
13-1B Single Factor Analysis of Variance (ANOVA) – Completely Randomized Design
A new type of glass is being developed to use in areas at risk of earthquakes. Three types of
glass have been developed, but the company only wishes to manufacture one. In order to test the
strength of the glass, windowpanes of identical sizes were placed in a machine designed to shake
the glass in a manner that simulates the stress it would have to endure in an earthquake. In the
most severe earthquakes, the shaking can last as long as 5 minutes. The amount of time before
each windowpane shattered was recorded. The following table contains the summaries of the
data collected:
a) Test to determine whether or not there is a difference between the two types of additives and
the control group.
b) Perform a Fisher’s LSD test if necessary.
59
13-1C Single Factor Analysis of Variance (ANOVA) – Observational Study
The TRU School of Business and Economics recently did a survey of alumni salaries from the
past 5 years. The initial survey included only students in the finance major and the economics
major. This time we have data for the marketing major. Test for any difference between the two
using a 10% level of significance.
a) Test to determine whether or not there is a difference between the salaries of the three majors.
Use a 10% level of significance.
b) Perform a Fisher’s LSD test if necessary. Use a 10% level of significance.
60
13-1D Single Factor Analysis of Variance (ANOVA) – Observational Study
Students in different college majors are always complaining (or bragging) about how difficult
their field of study is relative to another. You decide that perhaps you could use the number of
hours spent studying as a proxy for the level of difficulty. The more hours spent studying, the
more difficult the subject matter must be. You survey students across 3 fields of study and ask
them how many hours per day then spend studying outside of class time. The following table
contains summaries of the data collected:
Count 16 14 17
Mean (Hours) 4.27 4.31 3.81
Standard deviation 0.71 1.02 0.94
a) Test to determine whether there is a difference in the average number of hours spent studying
between the three college majors.
b) Perform a Fisher’s LSD test if necessary.
61
13-2A Single Factor Analysis of Variance (ANOVA) – Randomized Block Design
Everything Co. frequently relies on courier services to deliver sensitive documents between its
regional offices. There are three courier companies available, each one offers loyalty discounts
giving the incentive for customers to choose one courier and stick with it. You decide to perform
a test to determine if there is a difference in delivery times between the three courier options you
have. You send three packages of equal size to each of five regional offices. Each package is
sent through one of the three couriers. The following table contains the delivery times, in hours:
Regional Block
Option A Option B Option C
Office Mean
a) Test to determine whether or not there is a difference between the three courier services. Use a
5% level of significance.
(Hint: SST = 1304.73)
62
13-2B Single Factor Analysis of Variance (ANOVA) – Randomized Block Design
Canine Munchies Inc. is developing a new brand of dog food designed specifically for less active
dogs. They have developed two types of food all with a lower fat, higher protein blend of
ingredients in order to minimize weight gain: a common problem among its target market of
inactive dogs. In order to determine if there is a difference between the two new brands of food
as compared to the dogs’ regular diet, a group of five dogs were each fed the three types of food
for a period of 30 days. Two of the three were the new brands, and the third was the dogs’
original diet. The data below shows the difference in the dogs’ weight between the first and 30th
day on the diet. A positive number indicates a weight gain; a negative number indicates a weight
loss.
a) Test to determine whether or not there is a difference between the two types of dog food and
their original diet in terms of their impact on the dogs’ weight. (Hint: SST = 22.62)
63
13-2C Single Factor Analysis of Variance (ANOVA) – Randomized Block Design
Standardized testing is very common in many countries and is frequently used as a screening tool
in college applications. Students write exams in three areas: Reading Comprehension,
Mathematics and Grammar. Each test is scored on a 1000 point scale. The following table
contains test scores for five students.
Block
Student Reading Math Grammar
Mean
a) Test to determine whether or not there is a difference in average grade across the three subject.
Use a 10% level of significance. (Hint: SST = 337360.00)
64
13-3A Two Factor with Replication Analysis of Variance (ANOVA) – Factorial Design.
The local animal rescue shelter is interested in knowing if there is a significant difference in the
number of animal adoptions between its three largest shelters during its busiest weekend of the
year. It also is interested in knowing if there is a significant difference between the number of
cats and dogs adopted as well. The following data shows the number of each animal adopted at
each of its three shelters on each day of the long weekend.
A B C Treatment Means
2 2 1
Dogs 3 1 2 2.11
3 1 4
Interaction Means 2.67 1.33 2.33
4 2 4
Cats 5 4 3 4.00
6 3 5
Interaction Means 5.00 3.00 4.00 Grand mean
Treatment Means 3.83 2.17 3.17 3.06
a) Test to determine whether or not there is a difference between the average number of
adoptions by location, by animal, type and interaction. (SST = 36.94)
65
13-3B Two Factor with Replication Analysis of Variance (ANOVA) – Factorial Design.
A designer of commercial retail space is doing a study to determine which method of managing
line ups at the till works best. Method A involves many smaller line ups at individual tills.
Method B involves one large line up being served by multiple tills. The table below contains the
wait times, in minutes, of customers in three different retail settings using each of the two
proposed methods.
a) Test to determine whether or not there is a difference in the average number of minutes
waiting by queue method, retail setting and for interaction. (SST = 63.61)
66
13-3C Two Factor with Replication Analysis of Variance (ANOVA) – Factorial Design.
As part of a study being done on regional wage differences, data has been collected on wages of
two different trades, Carpentry and Welding, across three regions: west coast, central and east
coast. You’ve been tasked with performing the appropriate analysis to determine whether there
exists a difference in hourly wage rates between these two professions across these three regions.
a) Test to determine whether or not there is a difference in the average number of minutes
waiting by queue method, retail setting and for interaction. (SST = 68.00)
67
Module 14: Simple
Linear Regression
68
14-1A Simple Linear Regression
There’s a strong belief that student performance is directly linked to the amount of time the
student spends studying. In order to test this claim, you gather the following data to estimate:
𝐸(𝐺) = 𝛽$ + 𝛽% 𝐻𝑟𝑠
Hours of
Observation Grade
study
1 43 3.2
2 71 3.9
3 36 2.4
4 75 3.7
5 81 5.1
Mean 61.2 3.66
Regression Statistics
Multiple R
R Square
Adj. R Square NA
Standard Error
Observations
ANOVA
df SS MS F P-value
Regression 1301.62 0.04
Error
Total 1644.80
69
14-1B Simple Linear Regression
𝐸(𝑄& ) = 𝛽$ + 𝛽% 𝑃:
Regression Statistics
Multiple R
R Square 0.88
Adj R Square NA
Standard Error 1.50
Observations
ANOVA
df SS MS F P-value
Regression 47.26
Error
Total 54.00
70
14-1C Simple Linear Regression
My parents once told me that the longer they’re married, the happier they were. Let’s test this
claim. You find the following data:
Happiness
Observation Years Married
Index
1 15 80
2 10 60
3 36 75
4 44 90
5 54 95
Mean 31.8 80
Regression Statistics
Multiple R 0.84
R Square
Adjusted R Square NA
Standard Error
Observations
ANOVA
df SS MS F p-value
Regression 7.43 0.07
Residual 215.77
Total
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 60.47 7.46 0.00
Years Married
71
Module 15: Multiple
Regression
72
15-1A Multiple Linear Regression
The following estimated regression equation states that quantity sold of a good, is a function of
its own price ( Px ), the price of a related good ( Py ), advertising expenditures ( A ) and average
household income ( M ):
Prices are measured in dollars, while advertising and income are measured in thousands of
dollars. Quantity is measured in units. The following table provides the estimated regression
results:
Regression Statistics
Multiple R
R Square
Adj. R Square
Standard Error 88.71
Observations 30
ANOVA
df SS MS F p-value
Regression 103851.04 0.00
Error
Total
73
15-1B Multiple Linear Regression
The following estimated regression equation states that wheat yield (pounds), is a function of it’s
the average monthly rainfall in inches ( R ), the density of seed dispersion ( S ) in seeds per
square inch, average daily temperature, degrees Fahrenheit ( T ) and an index measuring the
quality of fertilizer ( F ):
𝐸(𝑌) = 𝛽$ + 𝛽% 𝑅 + 𝛽" 𝑆 + 𝛽) 𝑇 + 𝛽* 𝐹
Regression Statistics
Multiple R
R Square 0.92
Adj. R Square 0.91
Standard Error
Observations
ANOVA
df SS MS F p-value
Regression 70.24 0.00
Error 272956.16
Total 29
74
15-1C Multiple Linear Regression
With a belief that an individual’s salary can be predicted by their age and experience, the
following regression equation is estimated:
Salary is measured in thousands of dollars, while experience and age are measured in years.
Regression Statistics
Multiple R
R Square
Adj. R Square
Std Error
Observations
ANOVA
df SS MS F p-value
Regression 11721418.63 55.90 0.00
Residual
Total 29
75
15-2A Multiple Linear Regression – Dummy Variables
In problem 15-1B, we developed an estimated regression equation that demonstrated the problem
of multicollinearity. This revised model excludes years of experience, which was found to be
highly correlated with age. We have now added one dummy variable to the model to determine
the effect having a graduate degree has on salary. The new model is as follows:
Salary is measured in thousands of dollars; age is measured in years. The dummy variable DEG
equals one if the individual has a graduate degree, zero otherwise. The estimate regression output
is below.
Regression Statistics
Multiple R 0.96
R Square 0.92
Adjusted R
Square 0.91
Standard Error 213.97
Observations 30
ANOVA
df SS MS F p-value
Regression 2 13315989.47 6657994.74 145.42 0.00
Residual 27 1236199.89 45785.18
Total 29 14552189.36
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept -108.09 142.52 -0.76 0.45 -400.52 184.35
Age 47.18 3.18 14.82 0.00 40.65 53.72
Grad. Degree 472.29 79.34 5.95 0.00 309.50 635.07
76
15-2B Multiple Linear Regression – Dummy Variables
In problem 15-2A, we determined that having a graduate degree was a statistically significant
determinant of salary. In this revised model, we have further separated our sample by highest
level of educational attainment. The new model is as follows:
Salary is measured in thousands of dollars; age is measured in years. Those with a Bachelor’s
degree represent the base case. The dummy variable MA equals one if the individual has a
Master’s degree, zero otherwise. While the dummy variable PhD equals one if the individual has
a doctorate, zero otherwise. The estimate regression output is below.
Regression Statistics
Multiple R 0.96
R Square 0.92
Adj.R Square 0.91
Std Error 217.39
Observations 30
ANOVA
df SS MS F p-value
Regression 3 13323419.21 4441139.74 93.97 0.00
Residual 26 1228770.15 47260.39
Total 29 14552189.36
77
15-2C Multiple Linear Regression – Dummy Variables
Prices are measured in dollars, while advertising is measure in thousands of dollars. The dummy
variable takes the value zero for men and one for women. The following table provides the
estimated regression results:
Regression Statistics
Multiple R 0.85
R Square 0.72
Adj. R Square 0.67
Standard Error 83.01
Observations 30
ANOVA
df SS MS F p-value
Regression 4 439877.47 109969.37 15.96 0.00
Residual 25 172247.24 6889.89
Total 29 612124.71
78
15-3A Multiple Linear Regression – ANOVA using Dummies
Students in different college majors are always complaining (or bragging) about how difficult
their field of study is relative to another. You decide that perhaps you could use the number of
hours spent studying as a proxy for the level of difficulty. The more hours spent studying, the
more difficult the subject matter must be. You survey students across 3 fields of study and ask
them how many hours per day then spend studying outside of class time. We will use regression
analysis to estimate the follow regression equation:
We have defined Accounting to be the base case, with Phy identifying students who major in
physics, and Soci to identify sociology students. The following table provide the estimated
regression results:
Regression Statistics
Multiple R 0.26
R Square 0.07
Adj. R Square 0.02
Standard Error 0.89
Observations 47
ANOVA
df SS MS F p-value
Regression 2 2.52 1.26 1.57 0.22
Residual 44 35.18 0.80
Total 46 37.70
79
15-3B Multiple Linear Regression – ANOVA Analysis using dummies
WhiteTooth Inc. is developing an additive for its line of toothpastes that is designed to whiten
teeth in as little time as possible. It currently has two variations of the additive, Type A, and B,
but only wishes to produce and market one. In order determine the effectiveness of these new
additives, a focus group consisting of 28 people is organized. Nine are given type A, nine are
given type B and nine are the control group and are given a placebo. Each person is asked to use
the toothpaste and record the time of days it takes before their teeth achieve a predetermined
shade of white. We will use regression analysis to estimate the follow regression equation:
We have defined the placebo to be the base case, with TA identifying Type A toothpaste, and TB
identifying Type B toothpaste. The following table provides the estimated regression results:
Regression Statistics
Multiple R 0.53
R Square 0.28
Adj. R Square 0.22
Standard Error 1.29
Observations 27
ANOVA
df SS MS F P-value
Regression 2 15.41 7.70 4.60 0.02
Residual 24 40.22 1.68
Total 26 55.63
80