0% found this document useful (0 votes)
10 views

Statistics Two Workbook V2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Statistics Two Workbook V2

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

Introduction to Statistics

for Business and


Economics Workbook 2
(Version 2.0)

Peter Dalley

© Peter Dalley, 2021

1
Table of Contents
Module 9: Hypothesis Testing – Single Population Mean and Proportion................................ 3
Module 10: Hypothesis Testing – Two population means and proportions ........................... 20
Module 11: Hypothesis Testing – Variances .......................................................................... 36
Module 12: Multiple Proportions, Independence and Goodness of Fit .................................. 46
Module 13: Analysis of Variance ........................................................................................... 57
Module 14: Simple Linear Regression ................................................................................... 68
Module 15: Multiple Regression ........................................................................................... 72

2
Module 9: Hypothesis
Testing – Single
Population Mean and
Proportion

3
9-1A Single population mean, one-tail test, σ known

A local craft brewery claims the amount of beer in its bottles is 12oz (340ml). It knows that
making false claims on its labels would result in serious penalties if it overstated the true volume.
Every Monday morning, a sample of 25 bottles is taken to test the accuracy of their filling
machines. Over the past few years of weekly sampling, they have calculated the standard
deviation of the population to be σ = 2.1oz. This week, the sample resulted in a mean filling
volume of x = 11.4oz. Are they at risk of facing any penalties? Use α = 0.05.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

4
9-1B Single population mean, one tail test, σ known

An instructor in the TRU School of Business and Economics has produced a series of problems
and accompanying video walkthroughs in hopes of improving his statistics students’
understanding of course content. Having taught the course many times, he determines the
historical average to be 76% and the population standard deviation is σ = 17.3. At the end of the
following semester, his class of 45 students, who had access to the video walkthroughs, had an
average grade of x = 81.3%. Using a level of significance of α = 0.05, test to determine
whether this shows an improvement over the historical average.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

5
9-1C Single population mean, one tail test, σ known

Assume car salespeople historically sell an average of 96 cars per year. In an effort to increase
sales, James Barnett, the regional manager, proposed a new commission structure in hopes of
increasing the incentives for salespeople to sell more cars. After the first year of the new
commission structure, a sample of 26 salespeople from within his region had average car sales of
𝑥̅ = 98. Assume the population variance is known to be σ2 = 49. Do we have evidence to show
that the new commission structure has increased the average number of cars sold? Perform the
test at α = 0.05.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

6
9-1D Single population mean, two-tail test, σ known

A water bottle manufacturing facility produces bottles that are designed to hold 24oz (750ml).
The bottles are produced by pouring liquid plastic into a mold. Once the plastic hardens, it is
removed from the mold, polished, and has labels printed on it to mark various volumes: 8oz,
16oz and 24oz. The company’s quality assurance team periodically takes a sample of 30 bottles
and fills them to the 24oz line with water. The water from each bottle is then measured to
determine the actual volume of water contained in the bottle when filled to the 24oz line. Over
the years, they have determined the population standard deviation of filling volume to be σ = 1.4.
The most recent sample had an average volume of water at the 24oz line of 23.6oz. Use a 10%
level of significance to test the accuracy of the labels.
a) Formulate a test to determine if the 24oz label on the bottle is accurate. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusions.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) Verify your findings using the confidence interval approach.

7
9-1E Single population mean, two-tail test, σ known

A local farmer produces hay for nearby cattle ranchers. The hay is rolled into 50lb (22Kg) bails
and are sold by quantity. Therefore, when a rancher buys 100 bales of hay, they can expect to
receive 5,000lbs of hay. To ensure the cattle ranchers are getting what they expect, the farmer
periodically samples batches of 40 hay bales to test that they are averaging 50lbs. The most
recent batch provided a sample weight of x = 51.2lbs. The population standard deviation is
known to be σ = 3.73. Using a level of significance of α = 0.05 test to determine whether the
farmer is producing what the ranchers are expecting.
a) Formulate the null and alternative hypotheses. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Confirm your results with a confidence interval
f) Interpret your results

8
9-2A Probability of Type Two Error – One Tailed Test

Copied from Problem 9-1A:

A local craft brewery claims the amount of beer in its bottles is 12oz (340ml). It knows that
making false claims on its labels would result in serious penalties if it overstated the true volume.
Every Monday morning, a sample of 25 bottles is taken to test the accuracy of their filling
machines. Over the past few years of weekly sampling, they have calculated the standard
deviation of the population to be σ = 2.1oz. This week, the sample resulted in a mean filling
volume of x = 11.4. Are they at risk of facing any penalties? Use α = 0.05.
Formulate the null and alternative hypotheses. Justify your formulation.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) If the actual population mean is µa = 11oz, what is the probability of committing a Type II error?
Interpret this value. What if it is µa = 11.5oz? What if it is µa = 12.5oz,
g) If the manager states that she is willing to risk a β = 0.01 probability of not rejecting the null if
the average volume within 1oz of specification, how large should the sample size be?

9
9-2B Probability of Type Two Error– One Tailed Test

Copied from Problem 9-1C:

Assume car salespeople historically sell an average of 96 cars per year. In an effort to increase
sales, James Barnett, the regional manager, proposed a new commission structure in hopes of
increasing the incentives for salespeople to sell more cars. After the first year of the new
commission structure, a sample of 26 salespeople from within his region show and average car
𝑥̅ = 98. Assume the population variance is known to be σ2 = 49. Do we have evidence to show
that the new commission structure has increased the average number of cars sold? Perform the
test at the α = 0.05.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) If the actual population mean is µa = 100 cars, what is the probability of committing a Type II
error? Interpret this value. What if it is µa = 99? What if it is µa = 95?,
h) If the regional manager states that he is willing to risk a β = 0.10 probability of not rejecting the
null if the average number of cars sold is 99, how large should the sample size be?

10
9-2C Probability of Type Two Error – Two Tailed Test

Copied from Problem 9-1D:

A water bottle manufacturing facility produces bottles that are designed to hold 24oz (750ml).
The bottles are produced by pouring liquid plastic into a mold. Once the plastic hardens, it is
removed from the mold, polished, and has labels printed on it to mark various volumes: 8oz,
16oz and 24oz. The company’s quality assurance team periodically takes a sample of 30 bottles
and fills them to the 24oz line with water. The water from each bottle is then measured to
determine the actual volume of water contained in the bottle when filled to the 24oz line. Over
the years, they have determined the population standard deviation of filling volume to be σ = 1.4.
The most recent sample had an average volume of water at the 24oz line of 23.6oz. Use a 10%
level of significance to test the accuracy of the labels.
a) Formulate a test to determine if the 24oz label on the bottle is accurate. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusions.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) Verify your findings using the confidence interval approach.
g) If the actual population mean is µa = 23.2oz, what is the probability of committing a Type II
error? Interpret this value.
h) If the manager states that she is willing to risk a β = 0.10 probability of not rejecting the null if
the average volume within 0.5oz of specification, how large should the sample size be?

11
9-3A Single population mean, one-tail test, σ unknown

Red light cameras are often an effective deterrent to reducing the number of people who run red
lights at intersections. However, they can be expensive to install and maintain. For these
reasons, they are only installed at intersections where there tends to be the most accidents caused
by drivers running red lights. Let’s assume the threshold number of accidents is 12 per year; any
more than this warrants a camera. At one location, researchers found the average number of
accidents over a ten-year period was 12.6. Based on this data, they immediately submitted a
recommendation to install a red light camera at this intersection. After all, 12.6 is greater than
12.
a) Was this the correct decision?
b) What is the proper approach to making this decision? Show each step. Use α = 0.05.
(Hint: Assume you have access to their data and were able to calculate the sample standard
deviation to be s = 1.35)

12
9-3B Single population mean, one-tail test, σ unknown

Efficiency in a manufacturing process is crucial for the profitability of firms in competitive


markets. Currently, one particular part of the process is requiring 2.3 minutes (138 seconds) to
complete. In an effort to improve efficiency, a number of employees have developed a new
approach, which they believe will reduce the amount of time required to less than 2.1 minutes
(126 seconds). After making the appropriate adjustments and operating on the new system, they
collect a sample of production times for 41 units and find a sample mean of 123 seconds with a
standard deviation of 41.4. Use a 5% level of significance to test whether the new process
achieved its objective.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your results.
e) Develop and perform a test to determine if they succeeded at reducing time relative to the
previous standard.

13
9-3C Single population mean, one-tail test, σ unknown

A local fire department has a goal of responding to house fires in under 14 minutes. In order to
determine whether they are achieving their goal, samples of 20 response times are tested every
week. The most recent sample resulted in a mean response time of x = 13.2 minutes with a
sample standard deviation of s = 2.3. Use 𝛼 = 0.05.
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and discuss your results. Show all your work.
c) What would it mean to commit a Type I or Type II error in this scenario?

14
9-3D Single population mean, two-tail test, σ unknown

It is common among some universities to target a specific average grade in certain courses. At
the end of each semester, instructors faced with this constraint are required to determine whether
their class average is statistically different from the targeted average for the course. If they are
either above or below the target, adjustments must be made. Let us assume that in a particular
course, instructors are expected to have an average grade of 70%. This semester, the class
average was 73% with 44 students. The sample standard deviation is s = 0.13, or 13 percentage
points. Use α = 0.05.
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and discuss your results. Show all your work.
c) Produce a 95% confidence interval estimate consistent with this test.

15
9-4A Single population proportion, one-tail test

Election years always bring us reports full of statistics on who’s winning in the latest polls. One
recent pollster argued that the conservative candidate has support from more than half of
registered voters. After digging a little deeper into the article, you find more details on the
pollster’s findings. In a footnote at the bottom of the page, you find the following information:
Out of a sample of 175 registered voters, 95 (or 54.29%) stated that they support the conservative
candidate.
a) On what information is the pollster’s statement based? Is this fair and accurate?
b) Formulate the appropriate null and alternative hypothesis for a proper test.
c) Calculate your test statistic, p-value and conclusion.
d) State and interpret your conclusion.

16
9-4B Single population proportion, one-tail test

According to the Institution of Consumer Goods Waste, 60% of products that are returned to a
store for refund, are not in resaleable condition and are simply thrown away. A local
department store sampled 90 returns from the previous month and discovered 46 of them were in
such condition and had to be thrown away. The manager is hopeful that her department store
does a better job at only accepting returns and issuing refunds are goods that can be resold.
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and discuss your results. Show all your work.
c) Interpret your results.

17
9-4C Single population proportion, two-tail test

It has been argued that drunk drivers cause 50 percent of fatal accidents on the nation’s
highways. In order to test this claim, you obtain the following information: out of the last 120
accidents in your state, you find that 72 of them were caused by drunk driving.
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and p-value.
c) State your conclusion and interpret its meaning.

18
9-4D Single population proportion, two-tail test

While at a dinner party and having a lively debate with an old uncle about labour force
unionization, he claims that labour markets in the US are more competitive than in Canada. He
supports this claim by stating that in Canada roughly 30% of all workers are unionized, while in
the US its closer to 11%. You can’t believe unionization rates are so low in the US, so you do
some research. Suppose in one article you find that out of a sample of 225 workers, just 32 of
them are unionized. Do these numbers support your uncle’s claims about US unionization rates?
a) Formulate the appropriate null and alternative hypothesis. Justify your formulation.
b) Calculate your test statistic and p-value.
c) State your conclusion and interpret its meaning.

19
Module 10: Hypothesis
Testing – Two population
means and proportions

20
10-1A Two population means, one-tail test, σ known

A friend of yours has claimed that Subaru owners are, on average, faster drivers than Mazda
owners. Although you may not disagree with him, being the young statistician that you are, you
decide to gather some data and perform a test. You set up a radar on the highway and begin
collecting data. After one week, you’ve found the average speed of 47 Mazdas was 63.3mph and
the average speed of 51 Subarus was 65.7mph. Assume that we know the population variance of
the Mazda to be 𝜎!" = 26 and the Subaru to be 𝜎#" = 29.
a) Formulate the null and alternative hypothesis. Use 5% level of significance. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

21
10-1B Two population means, One tail test, σ known

Many students tend to choose courses based on which professor gives the easiest marks. One
student seems very certain that Prof. Dalley is much easier than Prof. Fraser. In order to test this
claim, you talk with other students who have taken courses with each of them. After talking with
41 of Prof. Dalley’s past students, you find the average grade was 73.2%. You talk with 49 of
Prof. Fraser’s past students and calculate their average grade to be 66.4%. Assume that we know
the population standard deviation of Prof. Dalley grades to be σD= 0.14 and Prof. Fraser’s grades
to be σF = 0.18.
a) Formulate a hypothesis to test to test the student’s claim. Use 3% level of significance. Justify
your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

22
10-1C Two population means, one tail test, σ known

Suppose your cousin, who just recently discovered you’re taking a statistics course, thought
they’d be a nuisance, just for the fun of it. They claim, without any hesitation, that the average
weight of a white onion is at least 50 grams more than the average weight of an ambrosia apple.
And of course, they ask you to prove them wrong. As the mature student that you are, you
decide to turn this into a teachable moment for your cousin. You both go to the grocery store and
select a random sample of white onions and apples. The following table contains your sample
data:
White Onions Ambrosia Apples
Count 28 25
Average Weight (grams) 300 256
Population Std. deviation 9.5 7

a) Formulate a hypothesis to test to challenge your cousin’s claim. Use α=0.05.


b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

23
10-1D Two population means, Two tail test, σ known

Anybody who has siblings knows how competitive they can become. Imagine, two brothers
playing catch with a baseball. It may start of as innocent play, but it won’t be long before it
becomes competitive with one bragging about their ability to throw further than the other. In
order to settle the argument, Dad comes out to take measurements. After each brother throws the
ball 50 times, Dad calculates Peter, had an average distance of 44 feet and David had an average
distance of 46 feet. As one would expect, David begins bragging as soon as he hears this news.
Dad suggests that on average, the two are throwing the ball equal distance. Assume that we know
the population standard deviation of Peter’s throws to be σP = 7.8 and David’s to be σD = 8.2.
a) Formulate a hypothesis to test to test the Dad’s claim. Use α=0.05. Use Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) Confirm your findings with a confidence interval estimate.

24
10-1E Two population means, Two tail test, σ known

Imagine the following results come from two independent random samples taken from two
populations.
Sample A Sample B
Count 35 30
Mean (lbs) 113 105
Population Std deviation 5.1 5.4

a) Formulate a hypothesis to test to determine that the difference between the two means is 5
pounds. Use α = 0.03.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
e) Confirm your results using a confidence interval estimate.

25
10-2A Two population means, One tail test, σ unknown but assumed equal

A friend once told you that Golden Retrievers are a much faster breed of dog than Border Collie.
As a dog lover, you become interested in determining whether or not the data would support
such a claim. Assume you manage to gather 29 Golden Retrievers and 31 Border Collie for a
massive 100-meter dog race! After the race, you gather all their times. You find the average
time for the Golden Retriever to 7.2 seconds and for the Border Collies, 8.1 seconds. You
calculate the sample standard deviations to be s = 1.83 and s = 1.57 seconds for the Retrievers
and Border Collies, respectively. Use α = 0.05.
a) Formulate a hypothesis to test your friend’s claim. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

26
10-2B Two population means, One tail test, σ unknown but assumed equal
(This question is intentionally similar to the previous one)

A friend once told you that Golden Retrievers are a much faster breed of dog than Border Collie.
As a dog lover, you become interested in determining whether or not the data would support
such a claim. Assume you manage to gather 29 Golden Retrievers and 31 Border Collie for a
massive 100-meter dog race! After the race, you gather all their times and calculate their speeds.
You find the average speed for the Golden Retriever to 50Km/h and for the Border Collies,
44.4Km/h. You calculate the sample standard deviations to be s = 12.71 Km/h and s = 8.62
Km/h for the Retrievers and Border Collies, respectively. Use α = 0.05.
a) Formulate a hypothesis to test your friend’s claim. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

27
10-2C Two population means, Two tail test, σ unknown but assumed equal

University classes are becoming increasingly diverse, with students moving from all parts of the
planet to study in difference countries. Imagine your statistics instructor gives you the following
assignment:
Measure the heights of the students in your classes and sort them by continent of origin. Perform
a hypothesis test to determine if the average height of students from North America is different
than the average height of students from Europe.
As the good student that you are, you awkwardly go around asking all of your classmates in your
classes how tall they are. You obtain the following data:

European North American


Count 33 41
Mean (inches) 70 68
Standard deviation 5.3 3.3

a) Formulate the appropriate hypothesis test. Use α = 0.05. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

28
10-2D Two population means, Two tail test, σ unknown but assumed unequal.

The TRU School of Business and Economics recently did a survey of alumni salaries from the
past 5 years. The initial survey included only students in the finance major and the economics
major. Test for any difference between the two using a 10% level of significance. (See Problem
13-1C to see how we compare 3 or more majors)
Economics Finance
Count 54 63
Mean ($) 96,480 94,315
Standard deviation 6340 6001

a) Formulate the appropriate hypothesis test. Justify your formulation.


b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

29
10-3A Two population means, One tail test, Matched Sample

As a bilingual country, there are benefits to speaking both official languages in Canada. This is
especially true for employees of the Canadian Public Service. In order to promote bilingualism,
public servants have the opportunity to take courses in their second language at no cost to
themselves. As a taxpayer funded training program, it’s important to verify that it is effective.
In order to test this, students are given an entry exam when they begin language training and an
exit exam once they are finished. The difference between their grades on these two exams are
used to determine if they have improved in their language proficiency. The following data is
from the most recent data collection:

Student Entrance Exam Exit Exam Difference


(Exit minus
Entrance)
1 26 34 8
2 44 51 7
3 66 69 3
4 52 51 -1
5 54 59 5
6 42 48 6

a) Formulate the appropriate hypothesis test using a 5% level of significance. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

30
10-3B Two population means, One tail test, Matched Sample

My fiancée is currently in the market for a new single serving blender for making healthy
smoothies. One manufacturer produces two different models branded as basic consumer and
professional athlete. According to the company website, the suggested retail price of the
professional athlete model is $20 more than the basic consumer model. You gather the following
sample of prices from various retailers:

Retailer Professional Consumer Difference


(Professional
minus Consumer)
1 119 99 20
2 124 94 30
3 109 82 27
4 114 79 35
5 112 94 18
6 99 79 20

a) Formulate the appropriate hypothesis test to determine if the price difference between the two
is no more than $20. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

31
10-3C Two population means, Two-tail test, Matched Sample

Retail gasoline outlets frequently advertise the benefits of their fuel additives in maintaining a
clean and smooth-running engine. However, there is some disagreement on whether or not it
affects fuel efficiency. In order to determine if the additive affect fuel efficiency, one chain of
gas stations measured fuel efficiency of 5 cars without the additive, then again with the additive.
The following table contains the data they collected:

Car Without Additive With additive Difference


(with minus
without)
1 19 23 -4
2 17 16 1
3 21 23 -2
4 15 18 -3
5 23 25 -2

a) Formulate the appropriate hypothesis test at the 0.05 level of significance. Justify your
formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

32
10-4A Two population proportion, One tail test

Tourism plays an important role in the local economies of many small towns. Because tourism
can be difficult industry to define, due to the broad spectrum of activities involved, hotel
occupancy rates are frequently used as a rough measure of the industry’s overall performance.
The following table provides hotel occupancy data for your hometown, during the month of
August for two consecutive years.
This Year Last Year
Occupied Rooms 1535 1332
Total Rooms 1840 1650

a) Formulate a test to determine if occupancy rates of increased.


b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion at the α=0.05 level of significance
d) Interpret your conclusion.

33
10-4B Two population proportion, One tail test

Extension of Problem 9-4B:

In problem 9-4B, we learned that according to the Institution of Consumer Goods Waste, 60% of
products that are returned to a store for refund, are not in resaleable condition and are simply
thrown away. A local department store sampled 90 returns from the previous month and
discovered 46 of them were in such condition and had to be thrown away. In problem 9-4B, we
found that this represented a statistically significant difference; meaning the store was successful
at throwing away a smaller proportion of their refunded products. Since then, the manager has
set a goal to reduce this portion every month. This month, from a sample of 105 returns, they
found that 45 had to be thrown away. Were they successful at reducing the proportion of returns
that had to be thrown away?
a) Formulate a test to determine if the proportion of returns that had to be thrown away
decreased.
b) Calculate the test statistic and corresponding p-value
c) Draw your conclusion at the α=0.05 level of significance.
d) Interpret your conclusion.

34
10-4C Two population proportion, two tail test

Many employers are beginning to recognize the importance of maintaining a happy and healthy
workforce. Some employers even provide their workers with arcades, child-care facilities,
libraries and all kinds of sporting activities such as tennis courts, pickle ball courts and more.
However, some people argue that these resources are not enjoyed equally by men and women.
Therefore, it is important to ensure that the resources being offered are benefitting both genders
equally. In order to ensure this to be true, one employer periodically samples its employees and
asks whether or not they are satisfied with their current working environment. The following
table provides the data collected.
Women Men
Satisfied respondents 51 48
Total number of employees 59 63

a) Formulate a test to determine both genders are equally satisfied with their workplace
environment.
b) Draw your conclusion at the α=0.05 level of significance.
c) Interpret your conclusion.

35
Module 11: Hypothesis
Testing – Variances

36
11-1A Hypothesis Testing: Single population variance – One tailed test.

An extension to Problem 9-1A:

A local craft brewery claims the amount of beer in its bottles is 12oz (340ml). It knows that
making false claims on its labels would result in serious penalties if it overstated the true volume.
Every Monday morning, a sample of 25 bottles is taken to test the accuracy of their filling
machines. Over the past few years of weekly sampling, they have calculated the standard
deviation of the population to be σ = 2.1oz. They would like this to be a maximum level of
variation in their filing process so, periodically, this is tested as well. This week, the sample
resulted in a mean filling volume of x = 11.4oz with a sample standard deviation of s = 2.55.
Perform a test to determine if they are exceeding their maximum desired variance. Use α =
0.05.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

37
11-1B Hypothesis Testing: Single population variance – One tailed test.

An extension to Problem 9-1E:

A local farmer produces hay for nearby cattle ranchers. The hay is rolled into 50lb (22Kg) bails
and are sold by quantity. Therefore, when a rancher buys 100 bales of hay, they can expect to
receive 5,000lbs of hay. To ensure the cattle ranchers are getting what they expect, the farmer
periodically samples batches of 40 hay bales to test that they are averaging 50lbs. It is important
that the bails be relatively consistent in size as well. The historical population standard deviation
is known to be σ = 3.73 but this is considered to be too high. In response to this, an effort has
been made to improve quality controls and reduce the variation in weights. The most recent
batch provided a sample weight of x = 51.2lbs and a standard deviation was found to be 2.99.
Using a level of significance of α = 0.05 test to determine whether the variance in the size of hay
bales has been successfully reduced.

a) Formulate the null and alternative hypotheses. Justify your formulation.


b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

38
11-1C Hypothesis Testing: Single population variance – two tailed test.

An extension to Problem 9-1D:

A water bottle manufacturing facility produces bottles that are designed to hold 24oz (750ml).
The bottles are produced by pouring liquid plastic into a mold. Once the plastic hardens, it is
removed from the mold, polished, and has labels printed on it to mark various volumes: 8oz,
16oz and 24oz. The company’s quality assurance team periodically takes a sample of 30 bottles
and fills them to the 24oz line with water. The water from each bottle is then measured to
determine the actual volume of water contained in the bottle when filled to the 24oz line. Over
the years, they have determined the population standard deviation of filling volume to be σ = 1.4.
This has been the industry standard for the variation in filling volumes for decades and it is the
goal of the facility to meet this standard. The most recent sample had an average volume of water
at the 24oz line of 23.6oz and a standard deviation of s = 1.82. Use a 1% level of significance.
a) Formulate the null and alternative hypotheses. Justify your formulation.
b) Calculate the test statistics for both tests.
c) Use the p-value approach to draw your conclusions.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

39
11-1D Hypothesis Testing: Single population variance – Two tailed test.

Extension of Problem 9-1B:

An instructor in the TRU School of Business and Economics has produced a series of problems
and accompanying video walkthroughs in hopes of improving his statistics students’
understanding of course content. Having taught the course many times, he determines the
historical average to be 76% and he the population standard deviation is σ = 17.3. At the end of
the following semester his class of 45 students, who had access to the video walkthroughs, had
an average grade of x = 81.3% with a standard deviation of s = 13.6. Using a level of
significance of α = 0.05, test to determine whether those with access to the videos have the same
variance in grades as the historical population.
a) Formulate the null and alternative hypothesis. Justify your formulation.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.
f) Develop a 95% confidence interval for the population standard deviation.

40
11-2A Hypothesis Testing: Two population variance – One tailed test.

One of the challenges in writing in-class exams is to ensure that they can be completed within
the allotted time. Even though the average completion time is within requirements, the variance
can often be a problem. Some students might finish in ten minutes, while others run out of time.
In an attempt to reduce the variance of completion times, a new computerized method of testing
has been implemented. In order to determine if the new method was successful at reducing the
variance of completion times, a sample 30 students were asked to write the exam using the old
method and a sample of 35 students wrote the exam using the new method. Those using the old
method finished in 78 minutes, on average, with a with a standard deviation in completion time
of s = 10.3 minutes. Their average grade was 72.3% with standard deviation s = 13. Those
using the new method finished in 75 minutes, on average, with a standard deviation of s = 7.5
minutes. Their average grade was 67.2% with standard deviation 14.

a) Formulate a test to determine if the new method succeeded at reducing the variance in
completion times.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Verify your conclusion using the critical value approach.
e) Interpret your conclusion.

One of your colleagues has suggested that although the new method may succeed at reducing the
variance in completion times, it comes at a cost to student performance. Test this claim.
a) State the appropriate test.
b) Calculate the test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

41
11-2B Hypothesis Testing: Two population variance – One tailed test.

When the seasons change, those living in colder climates need to switch the tires on their cars to
match the road conditions. Winter tires are made with a different rubber compound that allows
them to remain softer in cold temperatures to in order to maintain a better grip on cold and
slippery roads. Although average stopping distance is an important selling point, so too is the
variance in stopping distances. Suppose a consumer’s digest collects the following data on two
tires: the first is a popular winter tire that has been on the market for many years. The second, a
new tire advertised to have a new and improved rubber compound capable of stopping at least as
quickly but even more consistently.

Standard Winter Tire New and Improved Tire


Count 25 30
Average braking distance 53.2 52.0
100Km/h to 0 (in meters)
Standard deviation 4.25 3.24

a) Develop the appropriate tests to evaluate these claims. Use a 10% level of significance.
b) Draw your conclusion using the p-value and the critical value approaches.
c) Interpret your findings

42
11-2C Hypothesis Testing: Two population variance – Two tailed tests.

Copied from Problem 10-2A:

A friend once told you that Golden Retrievers are a much faster breed of dog than Border Collie.
As a dog lover, you become interested in determining whether or not the data would support
such a claim. Assume you manage to gather 29 Golden Retrievers and 31 Border Collie for a
massive 100-meter dog race! After the race, you gather all their times. You find the average
time for the Golden Retriever to 7.2 seconds and for the Border Collies, 8.1 seconds. You
calculate the sample standard deviations to be s = 1.83 and s = 1.57 seconds for the Retrievers
and Border Collies, respectively. Use α = 0.05.
a) Develop a test to determine if the assumption of equal variances was appropriate.
b) Calculate your test statistic
c) State your p-value, critical value and conclusion at the 0.05 level of significance.
d) Interpret your results.

43
11-2D Hypothesis Testing: Two population variance – Two tailed test.

Copied from Problem 10-2C:

University classes are becoming increasingly diverse, with students moving from all parts of the
planet to study in difference countries. Imagine your statistics instructor gives you the following
assignment:
Measure the heights of the students in your classes and sort them by continent of origin. Perform
a hypothesis test to determine if the average height of students from North America is different
than the average height of students from Europe.
As the good student that you are, you awkwardly go around asking all of your classmates in your
classes how tall they are. You obtain the following data:
European North American
Count 33 41
Mean (inches) 70 68
Standard deviation 5.3 3.7

a) Develop a test to determine if the assumption of equal variances was appropriate.


b) Calculate your test statistic
c) State your p-value, critical value and conclusion and the 0.05 level of significance.
d) Interpret your results.

44
11-2E Hypothesis Testing: Two population variance – Two tailed test.

Copied from Problem 10-2D:

The TRU School of Business and Economics recently did a survey of alumni salaries from the
past 5 years. The initial survey included only students in the finance major and the economics
major. Test for any difference between the two using a 10% level of significance.

Economics Finance
Count 54 63
Mean (inches) 96,480 94,315
Standard deviation 6340 6001

a) Develop a test to determine if the assumption of unequal variances was appropriate.


b) Draw your conclusions using the P-value approach and critical value approach.

45
Module 12: Multiple
Proportions,
Independence and
Goodness of Fit

46
12-1A Testing Equality Across Multiple Population Proportions

As part of an undergraduate research project, you decide to determine whether pet owners are
satisfied with their choice of pet. In order to gather sample data, you develop a survey to ask
respondents what type of pet they have (Dog, Cat, Other) and whether they are likely to adopt
the same pet upon its death, or a different one. The following table contains the observed
frequencies:

Type of Pet
Cat Dog Other Total
Likely to Yes 48 51 44 143
readopt No 31 26 68 125
Total 79 77 112 268

a) Formulate the null and alternative hypothesis. Use a 5% level of significance.


b) Compute the Expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
e) If appropriate, use the Marascuilo procedure to determine where any difference exist.

47
12-1B Testing Equality Across Multiple Population Proportions

After the most recent election, you decide to determine if there was a difference in the proportion
of voters who changed their voting intentions at the last minute. Some voters choose early on in
the campaign who they will vote for and stick with it, while others may change their minds when
new information becomes available. This might shed some light on which voters are more
susceptible to information released close to election day. In order to gather data, you produce a
survey that asks each respondent which party they voted for and if this was a result of a change
of intentions within the 4 weeks prior to Election Day. The following table provides the
observed frequencies:

Political Party
Conservatives Liberals Other Total
Changed Yes 96 82 87 265
voting No 98 79 49 226
intention Total 194 161 136 491

a) Formulate the null and alternative hypothesis. Use a 10% level of significance.
b) Compute the Expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
e) If appropriate, use the Marascuilo procedure to determine where any differences exist.

48
12-1C Testing Equality Across Multiple Population Proportions

A local car dealership is interested in determining customer satisfaction and brand loyalty. They
sample owners of three different brands of vehicles and ask whether or not they are likely to buy
the same brand when they purchase a new car, or shop around. The following table contains the
observed frequencies:

Brand of vehicle
Ford GMC Chevy Total
Likely to Yes 104 79 110 293
repurchase No 72 89 90 251
Total 176 168 200 544

a) Formulate the null and alternative hypothesis. Use a 5% level of significance.


b) Compute the Expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.
e) If appropriate, use the Marascuilo procedure to determine where any difference exist.

49
12-2A Tests of Independence

The local animal shelter is interested in knowing if people decision to adopt a pet or purchase
from a breeder is independent of the type of pet. Knowing which pets are more likely to be
adopted will help them manage their inventories. The following table provides the observed
frequencies:

Type of Pet
Cat Dog Other Total
Adopt 60 62 61 183
Purchase 38 71 65 174
Total 98 133 126 357

a) Formulate the null and alternative hypothesis. Use a 10% level of significance.
b) Compute the expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

50
12-2B Tests of Independence

Is political affiliation independent of gender? The following table provides the observed
frequencies from a recent survey of students at your college:

Male Female Totals


Republican 48 43 91
Political
Democratic 36 56 92
Party
Other 21 16 37
Totals 105 115 220

a) Formulate the null and alternative hypothesis. Use a 5% level of significance


b) Compute the expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

51
12-2C Tests of Independence

A local car dealership has reason to believe that whether a customer is married or single will
influence the brand of vehicle they choose to buy. The following table contains the observed
frequencies collected using a survey.

Married Single Totals


Ford 90 80 170
Brand of
Honda 69 91 160
Vehicle
Mazda 139 108 247
Totals 298 279 577

a) Formulate the null and alternative hypothesis. Use a 5% level of significance.


b) Compute the expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

52
12-3A Goodness of Fit – Multinomial Distribution

Monopolistically competitive markets are defined by the degree of product differentiation


between competing firms. Firms with products that are, in the minds of consumers, not close
substitutes for others, have greater ability to price their products above marginal costs. Suppose
a market consists of four firms (okay, yes this is more oligopolistic, but that’s not the point!).
Over the past 5 years, their market shares have stabilized to those in the table below. In January
of this year, H2Osprey introduced a new product. Perform the appropriate test to determine if
this new product has changed the distribution of market shares between the four firms.

Type of Pet
AquaBear WaterDog H2Osprey RainyCat Total
Historical
0.33 0.25 0.21 0.21 1
Market Share
Revenues
90 63 78 69 300
(Current Year)

a) Formulate the null and alternative hypothesis. Use a 10% level of significance
b) Compute the expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

53
12-3B Goodness of Fit – Multinomial Distribution

As part of a research project for your statistics class, you decide to gather data on your
classmates’ favourite candy to determine if there’s a statistically significant difference in their
preferences. You group their responses as follows:

Chocolate Bar Hard Candy Gummies Total


Shares 0.33 0.33 0.33 1
Observed 80 50 70 200

a) Formulate the null and alternative hypothesis. Use a 5% level of significance


b) Compute the expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

54
12-4A Goodness of Fit – Normal Distribution

A simple random sample of 30 grades from a principles of microeconomics course are listed
below. They have been sorted from largest to smallest for convenience. The mean grade is 61,
with a standard deviation of 17.
32.65 61.18 74.43
44.85 61.61 75.91
45.89 64.37 80.77
46.82 66.95 82.56
48.35 67.09 84.90
52.37 67.13 85.90
55.57 68.73 91.06
60.45 70.19 92.63
60.74 71.08 93.29
61.15 72.80 101.20

a) Formulate the null and alternative hypothesis. Use a 5% level of significance.


b) Compute the expected frequencies and test statistic.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

55
12-4B Goodness of Fit – Normal Distribution

A local painting company that employs students is doing some analysis on the completion times
of its employees. One part of the analysis is to determine if the completion times are normally
distributed or not. The data below consists of the number of minutes it took each of 30
employees to finish painting a small bedroom. The average completion time was 74 minutes
with a standard deviation of 15. The data has been sorted from largest to smallest for
convenience.
54.93 64.86 79.31
57.49 65.75 80.00
57.88 66.85 83.77
58.03 67.01 89.45
58.42 67.45 91.38
60.21 67.75 91.47
60.33 69.47 92.21
60.50 72.63 94.22
60.67 75.08 102.10
62.69 77.93 104.41

a) Formulate the null and alternative hypothesis.


b) Compute the actual and expected frequencies.
c) Use the p-value approach to draw your conclusion.
d) Interpret your conclusion.

56
Module 13: Analysis of
Variance

57
13-1A Single Factor Analysis of Variance (ANOVA) – Completely Randomized Design

WhiteTooth Inc. is developing an additive for its line of toothpastes that is designed to whiten
teeth in as little time as possible. It currently has two variations of the additive, Type A, and B,
but only wishes to produce and market one. In order determine the effectiveness of these new
additives, a focus group consisting of 28 people is organized. Nine are given type A, nine are
given type B and nine are the control group and are given a placebo. Each person is asked to use
the toothpaste and record the time of days it takes before their teeth achieve a predetermined
shade of white. The following table contains the data collected:

Type A Type B Control Group

5 4 9
5 5 7
7 6 8
4 5 9
6 8 7
6 7 7
8 4 8
7 6 5
6 5 6
Mean 6 5.56 7.33
Variance 1.5 1.78 1.75

a) Test to determine whether or not there is a difference between the two types of additives and
the control group.
b) Perform a Fisher’s LSD test if necessary.

58
13-1B Single Factor Analysis of Variance (ANOVA) – Completely Randomized Design

A new type of glass is being developed to use in areas at risk of earthquakes. Three types of
glass have been developed, but the company only wishes to manufacture one. In order to test the
strength of the glass, windowpanes of identical sizes were placed in a machine designed to shake
the glass in a manner that simulates the stress it would have to endure in an earthquake. In the
most severe earthquakes, the shaking can last as long as 5 minutes. The amount of time before
each windowpane shattered was recorded. The following table contains the summaries of the
data collected:

Type I Type II Type III


Count 13.00 10.00 11.00
Mean (Minutes) 5.30 5.60 5.70
Standard deviation 0.62 0.55 0.59

a) Test to determine whether or not there is a difference between the two types of additives and
the control group.
b) Perform a Fisher’s LSD test if necessary.

59
13-1C Single Factor Analysis of Variance (ANOVA) – Observational Study

Copied from Problem 10-2D:

The TRU School of Business and Economics recently did a survey of alumni salaries from the
past 5 years. The initial survey included only students in the finance major and the economics
major. This time we have data for the marketing major. Test for any difference between the two
using a 10% level of significance.

Economics Finance Marketing


Count 54 63 61
Mean ($) 96,213 94,315 92,416
Standard deviation 6340 6001 6,235

a) Test to determine whether or not there is a difference between the salaries of the three majors.
Use a 10% level of significance.
b) Perform a Fisher’s LSD test if necessary. Use a 10% level of significance.

60
13-1D Single Factor Analysis of Variance (ANOVA) – Observational Study

Students in different college majors are always complaining (or bragging) about how difficult
their field of study is relative to another. You decide that perhaps you could use the number of
hours spent studying as a proxy for the level of difficulty. The more hours spent studying, the
more difficult the subject matter must be. You survey students across 3 fields of study and ask
them how many hours per day then spend studying outside of class time. The following table
contains summaries of the data collected:

Accounting Physics Sociology

Count 16 14 17
Mean (Hours) 4.27 4.31 3.81
Standard deviation 0.71 1.02 0.94

a) Test to determine whether there is a difference in the average number of hours spent studying
between the three college majors.
b) Perform a Fisher’s LSD test if necessary.

61
13-2A Single Factor Analysis of Variance (ANOVA) – Randomized Block Design

Everything Co. frequently relies on courier services to deliver sensitive documents between its
regional offices. There are three courier companies available, each one offers loyalty discounts
giving the incentive for customers to choose one courier and stick with it. You decide to perform
a test to determine if there is a difference in delivery times between the three courier options you
have. You send three packages of equal size to each of five regional offices. Each package is
sent through one of the three couriers. The following table contains the delivery times, in hours:

Regional Block
Option A Option B Option C
Office Mean

1 27.6 23.1 28.4 26.37

2 48.3 45.1 45.6 46.33


3 22.6 21.2 22.7 22.17
4 35.4 28.5 25.1 29.67
5 23.5 19.6 22.3 21.80
Treatment
31.48 27.5 28.82 29.27
Mean

a) Test to determine whether or not there is a difference between the three courier services. Use a
5% level of significance.
(Hint: SST = 1304.73)

62
13-2B Single Factor Analysis of Variance (ANOVA) – Randomized Block Design

Canine Munchies Inc. is developing a new brand of dog food designed specifically for less active
dogs. They have developed two types of food all with a lower fat, higher protein blend of
ingredients in order to minimize weight gain: a common problem among its target market of
inactive dogs. In order to determine if there is a difference between the two new brands of food
as compared to the dogs’ regular diet, a group of five dogs were each fed the three types of food
for a period of 30 days. Two of the three were the new brands, and the third was the dogs’
original diet. The data below shows the difference in the dogs’ weight between the first and 30th
day on the diet. A positive number indicates a weight gain; a negative number indicates a weight
loss.

Dog Type 1 Type 2 Original Diet Block Mean


1 -1.2 -1.3 -0.3 -0.93
2 -1.4 -1.6 1.5 -0.50
3 -0.7 -1.5 1.2 -0.33
4 0.8 1.3 1.6 1.23
5 -1.8 -1.4 -0.4 -1.20
Treatment Mean -0.86 -0.9 0.72 -0.35

a) Test to determine whether or not there is a difference between the two types of dog food and
their original diet in terms of their impact on the dogs’ weight. (Hint: SST = 22.62)

63
13-2C Single Factor Analysis of Variance (ANOVA) – Randomized Block Design

Standardized testing is very common in many countries and is frequently used as a screening tool
in college applications. Students write exams in three areas: Reading Comprehension,
Mathematics and Grammar. Each test is scored on a 1000 point scale. The following table
contains test scores for five students.

Block
Student Reading Math Grammar
Mean

1 865 900 850 871.67

2 730 755 790 758.33


3 805 780 815 800.00
4 405 460 505 456.67
5 765 820 870 818.33
Treatment
714 743 766 741.00
Mean

a) Test to determine whether or not there is a difference in average grade across the three subject.
Use a 10% level of significance. (Hint: SST = 337360.00)

64
13-3A Two Factor with Replication Analysis of Variance (ANOVA) – Factorial Design.

The local animal rescue shelter is interested in knowing if there is a significant difference in the
number of animal adoptions between its three largest shelters during its busiest weekend of the
year. It also is interested in knowing if there is a significant difference between the number of
cats and dogs adopted as well. The following data shows the number of each animal adopted at
each of its three shelters on each day of the long weekend.

A B C Treatment Means
2 2 1
Dogs 3 1 2 2.11
3 1 4
Interaction Means 2.67 1.33 2.33
4 2 4
Cats 5 4 3 4.00
6 3 5
Interaction Means 5.00 3.00 4.00 Grand mean
Treatment Means 3.83 2.17 3.17 3.06

a) Test to determine whether or not there is a difference between the average number of
adoptions by location, by animal, type and interaction. (SST = 36.94)

65
13-3B Two Factor with Replication Analysis of Variance (ANOVA) – Factorial Design.

A designer of commercial retail space is doing a study to determine which method of managing
line ups at the till works best. Method A involves many smaller line ups at individual tills.
Method B involves one large line up being served by multiple tills. The table below contains the
wait times, in minutes, of customers in three different retail settings using each of the two
proposed methods.

Grocery Electronics Toys Treatment Means


6 9 3
Method 1 7 7 2 5.33
5 6 3
Interaction Means 6.00 7.33 2.67
4 6 4
Method 2 3 4 5 4.11
2 3 6
Interaction Means 3.00 4.33 5.00 Grand mean
Treatment Means 4.50 5.83 3.83 4.72

a) Test to determine whether or not there is a difference in the average number of minutes
waiting by queue method, retail setting and for interaction. (SST = 63.61)

66
13-3C Two Factor with Replication Analysis of Variance (ANOVA) – Factorial Design.

As part of a study being done on regional wage differences, data has been collected on wages of
two different trades, Carpentry and Welding, across three regions: west coast, central and east
coast. You’ve been tasked with performing the appropriate analysis to determine whether there
exists a difference in hourly wage rates between these two professions across these three regions.

West Coast Central East Coast Treatment Means


26 25 27
Carpentry 24 24 26 25.44
27 25 25
Interaction Means 25.67 24.67 26.00
26 26 30
Welding 30 26 31 27.89
28 27 27
Interaction Means 28.00 26.33 29.33 Grand mean
Treatment Means 26.83 25.5 27.67 26.67

a) Test to determine whether or not there is a difference in the average number of minutes
waiting by queue method, retail setting and for interaction. (SST = 68.00)

67
Module 14: Simple
Linear Regression

68
14-1A Simple Linear Regression

There’s a strong belief that student performance is directly linked to the amount of time the
student spends studying. In order to test this claim, you gather the following data to estimate:

𝐸(𝐺) = 𝛽$ + 𝛽% 𝐻𝑟𝑠

Hours of
Observation Grade
study
1 43 3.2
2 71 3.9
3 36 2.4
4 75 3.7
5 81 5.1
Mean 61.2 3.66

a) Fill in the blanks in the table below.


b) Write the estimated regression equation and interpret the results.
c) Use the estimated regression equation to develop a confidence interval estimate for the
average grade for somebody who studies 5 hours.

Regression Statistics
Multiple R
R Square
Adj. R Square NA
Standard Error
Observations

ANOVA
df SS MS F P-value
Regression 1301.62 0.04
Error
Total 1644.80

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept 20.31 0.81
Hours

69
14-1B Simple Linear Regression

Use the following information to estimate the corresponding demand curve:

𝐸(𝑄& ) = 𝛽$ + 𝛽% 𝑃:

Observation Quantity Price ($)


1 5 63
2 8 58
3 10 59
4 13 54
5 14 55
Mean 10 57.8

a) Fill in the blanks in the table below.


b) Write the estimated regression equation and interpret the results.
c) Use the estimated regression equation to develop a prediction interval estimate for the quantity
demanded at a price of 56.

Regression Statistics
Multiple R
R Square 0.88
Adj R Square NA
Standard Error 1.50
Observations

ANOVA
df SS MS F P-value
Regression 47.26
Error
Total 54.00

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept 0.01 27.02 104.48
Price 0.02

70
14-1C Simple Linear Regression

My parents once told me that the longer they’re married, the happier they were. Let’s test this
claim. You find the following data:
Happiness
Observation Years Married
Index
1 15 80
2 10 60
3 36 75
4 44 90
5 54 95
Mean 31.8 80

a) Fill in the blanks in the table below.


b) Write the estimated regression equation and interpret the results.

Regression Statistics
Multiple R 0.84
R Square
Adjusted R Square NA
Standard Error
Observations

ANOVA
df SS MS F p-value
Regression 7.43 0.07
Residual 215.77
Total

Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 60.47 7.46 0.00
Years Married

71
Module 15: Multiple
Regression

72
15-1A Multiple Linear Regression

The following estimated regression equation states that quantity sold of a good, is a function of
its own price ( Px ), the price of a related good ( Py ), advertising expenditures ( A ) and average
household income ( M ):

𝐸(𝑄' ) = 𝛽$ + 𝛽% 𝑃& + 𝛽" 𝑃( + 𝛽) 𝐴 + 𝛽* 𝑀

Prices are measured in dollars, while advertising and income are measured in thousands of
dollars. Quantity is measured in units. The following table provides the estimated regression
results:

Regression Statistics
Multiple R
R Square
Adj. R Square
Standard Error 88.71
Observations 30

ANOVA
df SS MS F p-value
Regression 103851.04 0.00
Error
Total

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept 0.00 241.71 1027.31
Px -7.84 1.75 0.00
Py -10.14 0.00 -5.45
A 1.80 0.00 11.53
M 5.10 1.72 0.10 11.20

a. Fill in the blanks and write the estimated regression equation.


b. Interpret each of the estimated coefficients and the corresponding interval estimates.
c. Interpret the R-squared.
d. Interpret the results of the tests for individual parameter significance and overall model
significance.

73
15-1B Multiple Linear Regression

The following estimated regression equation states that wheat yield (pounds), is a function of it’s
the average monthly rainfall in inches ( R ), the density of seed dispersion ( S ) in seeds per
square inch, average daily temperature, degrees Fahrenheit ( T ) and an index measuring the
quality of fertilizer ( F ):
𝐸(𝑌) = 𝛽$ + 𝛽% 𝑅 + 𝛽" 𝑆 + 𝛽) 𝑇 + 𝛽* 𝐹

The following table provides the estimated regression results:

Regression Statistics
Multiple R
R Square 0.92
Adj. R Square 0.91
Standard Error
Observations

ANOVA
df SS MS F p-value
Regression 70.24 0.00
Error 272956.16
Total 29

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept 1056.63 0.00 735.36
Rain 32.29 0.86 0.40 109.42
Seeds 3.14 0.00 2.32 11.16
Fert 15.47 2.38 6.50 0.00
Temp -9.06 0.74 -64.52 46.39

a. Write the estimated regression equation.


b. Interpret the R-square.
c. Interpret the coefficients and corresponding confidence interval estimates
d. Interpret the p-values for the tests on individual parameter significance and overall model
significance.
e. See corresponding video for further discussion.

74
15-1C Multiple Linear Regression

With a belief that an individual’s salary can be predicted by their age and experience, the
following regression equation is estimated:

𝐸(𝑆𝑎𝑙𝑎𝑟𝑦) = 𝛽$ + 𝛽% (𝐸𝑥𝑝) + 𝛽" (𝐴𝑔𝑒)

Salary is measured in thousands of dollars, while experience and age are measured in years.

Regression Statistics
Multiple R
R Square
Adj. R Square
Std Error
Observations

ANOVA
df SS MS F p-value
Regression 11721418.63 55.90 0.00
Residual
Total 29

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept 199.73 0.47 0.64
Age 30.84 0.27 -28.79
Experience 14.90 0.52 0.61 74.09

a. Fill in the blanks a write the estimated regression equation.


b. Interpret the R-squared.
c. Interpret the p-values and confidence interval estimates.
d. Discuss the results of the tests on individual parameters and the model.
e. See corresponding video for further discussion.

75
15-2A Multiple Linear Regression – Dummy Variables
In problem 15-1B, we developed an estimated regression equation that demonstrated the problem
of multicollinearity. This revised model excludes years of experience, which was found to be
highly correlated with age. We have now added one dummy variable to the model to determine
the effect having a graduate degree has on salary. The new model is as follows:

𝐸(𝑆) = 𝛽$ + 𝛽% (𝐴𝑔𝑒) + 𝛽" (𝐷𝐸𝐺)

Salary is measured in thousands of dollars; age is measured in years. The dummy variable DEG
equals one if the individual has a graduate degree, zero otherwise. The estimate regression output
is below.

Regression Statistics
Multiple R 0.96
R Square 0.92
Adjusted R
Square 0.91
Standard Error 213.97
Observations 30

ANOVA
df SS MS F p-value
Regression 2 13315989.47 6657994.74 145.42 0.00
Residual 27 1236199.89 45785.18
Total 29 14552189.36

Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept -108.09 142.52 -0.76 0.45 -400.52 184.35
Age 47.18 3.18 14.82 0.00 40.65 53.72
Grad. Degree 472.29 79.34 5.95 0.00 309.50 635.07

a. Write the estimated regression equation.


b. Interpret the R-square.
c. Interpret the coefficients and corresponding confidence interval estimates
d. Interpret the p-values for the tests on individual parameter significance and overall model
significance.

76
15-2B Multiple Linear Regression – Dummy Variables
In problem 15-2A, we determined that having a graduate degree was a statistically significant
determinant of salary. In this revised model, we have further separated our sample by highest
level of educational attainment. The new model is as follows:

𝐸(𝑆𝑎𝑙𝑎𝑟𝑦) = 𝛽$ + 𝛽% (𝐴𝑔𝑒) + 𝛽" (𝑀𝐴) + 𝛽) (𝑃ℎ𝐷)

Salary is measured in thousands of dollars; age is measured in years. Those with a Bachelor’s
degree represent the base case. The dummy variable MA equals one if the individual has a
Master’s degree, zero otherwise. While the dummy variable PhD equals one if the individual has
a doctorate, zero otherwise. The estimate regression output is below.

Regression Statistics
Multiple R 0.96
R Square 0.92
Adj.R Square 0.91
Std Error 217.39
Observations 30

ANOVA
df SS MS F p-value
Regression 3 13323419.21 4441139.74 93.97 0.00
Residual 26 1228770.15 47260.39
Total 29 14552189.36

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept -112.44 145.22 -0.77 0.45 -410.94 186.06
Age 47.29 3.25 14.57 0.00 40.62 53.96
Master’s 447.34 102.25 4.38 0.00 237.17 657.51
Doctorate 490.93 93.32 5.26 0.00 299.11 682.75

a. Write the estimated regression equation.


b. Interpret the R-square.
c. Interpret the coefficients and corresponding confidence interval estimates
d. Interpret the p-values for the tests on individual parameter significance and overall model
significance.

77
15-2C Multiple Linear Regression – Dummy Variables

In Problem 15-1A, we estimated a demand equation and found income to be statistically


insignificant. The following regression equation removes income from the model, but now
includes a dummy variable for gender, with the view that men and women have difference
consumption habits (non-binary gender identifications would require additional dummy variables
as described in problem 15-1B). The model now states that quantity demanded, is a function of
its own price ( Px ), the price of a related good ( Py ), advertising expenditures ( A ) and the gender
of the consumer (G).
𝐸(𝑄' ) = 𝛽$ + 𝛽% 𝑃& + 𝛽" 𝑃( + 𝛽) 𝐴 + 𝛽* 𝐺

Prices are measured in dollars, while advertising is measure in thousands of dollars. The dummy
variable takes the value zero for men and one for women. The following table provides the
estimated regression results:

Regression Statistics
Multiple R 0.85
R Square 0.72
Adj. R Square 0.67
Standard Error 83.01
Observations 30

ANOVA
df SS MS F p-value
Regression 4 439877.47 109969.37 15.96 0.00
Residual 25 172247.24 6889.89
Total 29 612124.71

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept 908.47 68.63 13.24 0.00 767.13 1049.82
G 82.56 31.35 2.63 0.01 17.99 147.13
Px -8.90 1.64 -5.43 0.00 -12.28 -5.53
Py -10.35 2.13 -4.87 0.00 -14.73 -5.97
A 7.41 1.63 4.55 0.00 4.06 10.77

a. Write the estimated regression equation.


b. Interpret the R-square.
c. Interpret the coefficients and corresponding confidence interval estimates
d. Interpret the p-values for the tests on individual parameter significance and overall model
significance.

78
15-3A Multiple Linear Regression – ANOVA using Dummies

Refer to Problem 13-1D to see the original data set.

Students in different college majors are always complaining (or bragging) about how difficult
their field of study is relative to another. You decide that perhaps you could use the number of
hours spent studying as a proxy for the level of difficulty. The more hours spent studying, the
more difficult the subject matter must be. You survey students across 3 fields of study and ask
them how many hours per day then spend studying outside of class time. We will use regression
analysis to estimate the follow regression equation:

𝐸(𝐻𝑟𝑠) = 𝛽$ + 𝛽% (𝑃ℎ𝑦) + 𝛽" (𝑆𝑜𝑐𝑖)

We have defined Accounting to be the base case, with Phy identifying students who major in
physics, and Soci to identify sociology students. The following table provide the estimated
regression results:

Regression Statistics
Multiple R 0.26
R Square 0.07
Adj. R Square 0.02
Standard Error 0.89
Observations 47

ANOVA
df SS MS F p-value
Regression 2 2.52 1.26 1.57 0.22
Residual 44 35.18 0.80
Total 46 37.70

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept 4.27 0.22 19.11 0.00 3.82 4.72
Physics 0.03 0.33 0.10 0.92 -0.63 0.69
Sociology -0.47 0.31 -1.49 0.14 -1.09 0.16

a. Write the estimated regression equation.


b. Interpret the coefficients and corresponding confidence interval estimates
c. Interpret the p-values for the tests on individual parameter significance and overall model
significance.

79
15-3B Multiple Linear Regression – ANOVA Analysis using dummies

Refer to Problem 13-1A to see the original data set.

WhiteTooth Inc. is developing an additive for its line of toothpastes that is designed to whiten
teeth in as little time as possible. It currently has two variations of the additive, Type A, and B,
but only wishes to produce and market one. In order determine the effectiveness of these new
additives, a focus group consisting of 28 people is organized. Nine are given type A, nine are
given type B and nine are the control group and are given a placebo. Each person is asked to use
the toothpaste and record the time of days it takes before their teeth achieve a predetermined
shade of white. We will use regression analysis to estimate the follow regression equation:

𝐸(𝑑𝑎𝑦𝑠) = 𝛽$ + 𝛽% (𝑇𝐴) + 𝛽" (𝑇𝐵)

We have defined the placebo to be the base case, with TA identifying Type A toothpaste, and TB
identifying Type B toothpaste. The following table provides the estimated regression results:

Regression Statistics
Multiple R 0.53
R Square 0.28
Adj. R Square 0.22
Standard Error 1.29
Observations 27

ANOVA
df SS MS F P-value
Regression 2 15.41 7.70 4.60 0.02
Residual 24 40.22 1.68
Total 26 55.63

Coefficients Std Error t Stat P-value Lower 95% Upper 95%


Intercept 7.33 0.43 16.99 0.00 6.44 8.22
Type A -1.33 0.61 -2.18 0.04 -2.59 -0.07
Type B -1.78 0.61 -2.91 0.01 -3.04 -0.52

a. Write the estimated regression equation.


b. Interpret the coefficients and corresponding confidence interval estimates
c. Interpret the p-values for the tests on individual parameter significance and overall model
significance.

80

You might also like