Comm291 Practice Midterm
Comm291 Practice Midterm
Notes: This exam has 9 questions. The duration is 2 hours. Bookso noteso and calculators are allowed, but not computerso cellphones or on-line connectivity.
MT2013: Question
a) The Human Resources Department of a large university maintains records on its facultv members. The table displavs some of these data.
Place an
X in the
-Payroll of Employment
-Years
Number
-Birth
date Rating
_Faculty
Classification
b) Which of the following is (are) based on cross-sectional data? Company quarterly profits . -A. B. Percentage of Canadian adults who work full-time 'C. Historical closing stock prices Yearly student enrolments -D. Annual costs
c) Which of the following is (are) time series data? Number of employees in20l2 -A. This month's demand for an automotive part -B. This quarter's sales of automobiles -C. Weekly receipts at a clothing boutique -D. Percentage of employees who are female
-Teaching
-Salary
-E.
a) dir
d) The administration of a large university wants to study the types of wellness programs that would interest its employees. They plan to survey a random sample of employees. Under consideration are several sampling plans. Beside each plan, write the number of the sampling strategy given in the following list. for each. Choose from among: 1 Simple Random Sampling Stratified Random Sampling 3 Cluster Sampling Systematic Sampling
-E.
c)
d)
2: : 4:
e)
_ _ _
(i) There are five categories of employees (administration, faculty, professional staff,
clerical and maintenance). Randomly select ten individuals from each category. (ii) Each employee has an ID number. Randomly select 50 numbers. (iii) Randomly select a school within the university (e.g., Business School) and survey all of the individuals (administration, faculty, professional staff, clerical and maintenance) who work in that school. (iv) The HR Department has an alphabetized list of newly hired employees (hired within the last five years). After starting the process by randomly selecting an employee from the list, every fifth name is chosen to be included in the sample.
SG-4
A manufacturer of toys claims that less than3o/o of his toys are defective. When 100 toys were drawn from one production run of 5,000 toys, 5o/o werc found to be defective. For each term on the left, select the matching answer from the list to the right, and write the number in the blank. The 3Yovalue
e)
The
5o/o
value
MT2013: Question
Amagazine that publishes product reviews conducted a survey of teenagers'preferences for cell phones. Three brands of cell phone designed specifically with teens in mind were the focus of the study. The table summarizes responses by brand and gender.
Male
55
Female
87 150 113
Total
142 249 309 700
99-.
"196
Total
a)
350 .i
350
Which of the following charts would be appropriate for displaying the marginal distribution of cell phone brand? Bar Chart Histogram -B. -A. Line E. Stem and Leaf Display -C.
-D.
Graph
Boxplot
b) What percent
_A.
s0%
-F,.20%
E. 16%
c) What percent
50%
d) What percent
e)
Which of the following statement is true? _A. It appears that cell phone brand preference and gender are not related. _B. It appears that cell phone brand preference and gender are not independent. _C. It appears that cell phone brand preference and gender are independent. A scatterplot will be more informative here than a table. -D. None of the above _E,
SG.5
MT2013: Question 3
a) You have a set of 30 numbers. The standard deviation from these numbers is as zero. You can be certain that: Half of the numbers are above the mean _8. All of the numbers in the set are zero _C. All of the numbers in the set are equal The numbers are evenly spaced below and above the mean
-A.
-D.
b) Here is the five number summarv of the hourlv w Min Median o1 o3 20.94 37.64 44.77 49.24
for sales
Max
67.11
managers.
(i) The
Symmehic
-D.
(iv) Are there any outliers, as defined by the ooinner fences" criterion? _A. Yes, only on the left side of the distribution Yes, only on the right side of the distribution -B. Yes, on both sides of the distribution -C. No
-D.
(v) Suppose there had been an effor and that the lowest hourly wage for sales was $ 18.50 instead of $20.94. Indicate whether how this change would affect following swnmary statistics (increase, decrease, or stay about the same):
a. Mean
the
/
Stay the Same
c) In a perfectly symmetrical distribution,,which of the folowing statemenfs The distance from er ,o qz i. equar-to-trie;;." from to is false? e2 e3 il"6' distance
-D
d) Here is a
12 13 14 15 16
',,r,.,ui";;;
t34s78
347
26
17 J l8 9
(i) How many students were in the course? (ii) What was the maximum score?
An office supply chain has stores in Toronto and vancouver. be closed within the one of these stores is to coming yut *rro.h.lp -;k;,rr"J.jri"n, management reviews sales data. Below are boxpio;r unit sales for both locations.
e)
f#;dry
Which of the following statements is not correct? Monthry sales are higheil.Toronto compared
-c' _D.
-E'
Both distributions are fairly symmet ic. Monthly sares are more i*uarcin vun.ouuer compared to Toronto.
i,
\i";;;;;;;;o*.d
SG.7
MT2013: Question 4
a)
A consumer research group investigating the relationship between the price of rneat scatterp (per kg) and the fat contJnt (gramO githered datathatproduced the following
8c!fir.ploi of Fsl Gnms va Prlclrk0
(i) Which best describes the association between the price of meat and fat content? Negative, moderatelY strong -A. Negative, weak -B. Positive, strong -C. Positive, weak -D. E. No aPParent association :.. (ir) If the point in the lower left hand corner ($2.00 per kilogram, 6 grams of fat) is removed, would..$be correlation would most likely
-A. -B.
b) For each of the following pairs of variables, would you expect a large negative your choices' correlation , alargepositivJcorrelation, or a small correlation? Circle
1.
remain the same become stronger negative become weaker negative become Positive become zeto
The
of
Large
Large
r, decide c) For each of the following statements, about the correlation coefficierrt, whether it is True or False. Circle your choices as appropriate. True 1. r equals the proportion of times two variables on a straight line True 2. r willbe +1.0 only if all the data lie exactly on a horizontal straight line True 3. r measures the fraction of outliers that appear a scatterPlot True 4. If the correlation between X and Y is r, the correlation between Y and X is -r True 5. r is a unitless number and must always lie +1.0 inclusive. between -1.0 and
lie in
MT2013: Question 5
oolf
A labour efficiency consultant collected some data on several employees of a manufacturing operation: their stress levels (X, on a scale from Oio i0) and the productivity levels (Y, in parts made per hour). She only recorded some of the relevant computations, as follows:
s, :3.3
sr:11.1
s" = 4,3
a)
b)
Space
level
d)
For each additional unit on the stress scale, the productivity parts per hour.
what percentage :lrll"rn"tion in productivity levels can be explained by the shess level variable? Give your answer here, t-o ttre nearest whoie p.r..ni
e) Estimate the
g:
Suppose the employee in part e) has an actual productivity level of 60 parts per hour. Compute the residual and use the fact that the standard deviation
decide whether this data sentence only.
of the resiauats is 4.3 to point would be considered an outlier. Explain why in one
Residual:
Explanation:
Outlier?
Yes
No
g) Estimate the
h) Give an
expected to
interval range in which the productivity level of 95% of employees would be fall. Report to the nearest whole numbers. to
SG.9
MT2013: Question 6
I. Height of women II. Shoe sizes of men III. Age (years) of first-year university students
I&IIonly
I&IIIonly
(i) Find the mean and the standard deviation of the professor's solving time.
Mean:
SD=
15 and 25 minutes?
_A.0.38
8.0.17
_c.0.68
_D.0.06
_8.0.12
_F.
c) A soft drink machine dispenses a cup, syrup and carbonated water, hopefully in order! The amount of synrp injected is normally distributed with mean 15 ml and variance 10 ml2. The amount of water injected is normally distributed with mean and variance 15 ml2. The two amounts are independent of one another.
(i) Find the mean and standard deviation of the total amount of synrp and water
dispensed.
Mean:
SD:
are dispensed in a day, what are the mean and standard deviation total amount of liquid (syrup and water) that ate required?
(ii) If 25 drinks
Mean:
SD:
minutes. suppose a random sample of 25 ordering processes is selected. (i) The standard deviation of the sampling diskibution of mean times is _A. 0.4 minutes
d) Suppose the time it takes for a purchasing agent to complete an online ordering process is_normally distributed with a mean of 8 minute. und u standard deviation of 2
_D.
_8.0,12
minutes
(ii) what is the probability that the sampre mean will be less than 7.5 minutes?
_4.0.3944
height
UBC students is 65 inches, with SD 4 inches. you measure the heights of random samples of 100 males and 100 females.'Which result is the most unlikely? To decide, compute the z-scorE for each result and write the values ir trt. rp*Lffi"ia.a.
:)
-D.
for
-!. -c.
All females in your sample having an auerage height of 6g inches or more All males in your sample having an a'ne.agJheight of 73 inches or more
z-score for B
a treight
of 74 inches or more
forA=
C:
z-score for D =
SG-l1
MT2013: Question
66Work
with confidence!"
M
fi
a) EU (European Union) countries report that 460/o of their labour force is female. Is tX p"r..triug. of females in the Canadian labour force the same? Statscan plan to check a random Jample selected from more than 10,000 employment records on file to esti the percentage of females in the Canadian labour force.
Sr
4t
iI
(i) Statscan wants to estimate the percentage of females in the Canadian labour force *itftitr *5% with 907o confidence. How many employment records should be sampl
_A. tzt
_8.269 _c.451
_D.382
E. 1000
confident of estimating the percentage of Suppose that Statscan wants to be femates in the labour force to within +2o/o of the true percentage. Which of the follou would they have to do? Decrease the samPle size Select the same number of employment records Increase the samPle size Decrease the Precision Increase the samPling'error
(ii)
-A.
-C. -D.
-8.
-8.
select a random sample of 525 employment records, and find that229 of the people are 'females . The 90oh confidence interval is closest to:
47.2%;o
-A.40.1%to
59.7o/o 69.40/o
-B.275%to
-E.
True
False False
2. CIs are more information than point estimates because show how much the population parameters can vary.
3. The interval is wider
False False
4.95% of data values will fall in the range of a95oh CI for the mean.
5. We arc 95o/o confident that the confidence interval includes the samPle mean.
False
6. If we took many additional samples and computeda95a/o for each, then approximately 95% of those intervals would contain the population mean.
False
SG.I2
MT2013: Question 8
Suppose that
*Hypothetically speaking"
areport indicates that2SYo of Canadians have experienced difficulty in payments. Further suppose that anews organizatronrandomly sampled mortgage making 400 Canadians from 10 cities and found that 136 reported such difficulty. Does this indicate that the problem is more severe among these cities?
a) The
correct null and alternative hypotheses are Ho : p:0.28 and Ho : p > 0.28 -A. Ho : p:0.28 and Ho : p < 0.28 -8. Ho : p 0.28 and Ho : p * 0.28 -C. Ho I p i0.28 and Ho : p 0.28 -D. Ho : p > 0.28 and Ho : p 0.28
: :
-H.
is:
_A.
_8.
_c.
_E,
c) The
-1.28 -2.67
2.67
1.96
_D.
-1.28
_E. 0.0038
d)
0.2119
At a= .05, we can conclude that the percentage of Canadians in these cities experiencing difficulty making mortgage payments ... is significantly higher than 28oh is significantly lower than28%o is not significantly different from 28% _D. is equal to 28Vo _E. is none of the above; no conclusion can drawn with the given information.
e)Using the P-value in part c), which one of the following statements is true? A 90% confidence intervalfor p would contain2So/o -A. A 95% confidence intervalfor p would contain2SYo -B. A 95% confidence intervalfor p would not contain 28% -C. None of the above
-D.
through e):
f; An opinion poll in a city of 200,000 was based on a simple random sample of 2000 people. Another poll is to be taken in the same way in a second city of population 400,000. In order for this poll to have the same margin of error as the poll in the first city,
the sample size
_A.
1000
_8.2000 _c.4000
D.8000
SG.13
MT2013: Question 9
o'No Surprise:
to assist in determining the co Insurance companies track life expectancy info-rmation of all policyholders was life insurance policies. Last year the uu.rug.life expectancy now have a longer life years. ABI Insurance wants io determine littreir clients
MT2
a)Y
expectancy, on uu..ug., so thev rando"'1v.:Tp]:^:tr:lf:::tirtl'y"f,f will onry chang. th"it prr*ium structure if there is evidence t The sample has a mean of people who buy th.ir 61i"'i.r ut. fiuinilongerihan before' 4'48 years' ZS.O y.utt and a standard deviation of
il"#;;#;;ffiv
86 75
85
i,lli'll?;
e) 5. 5,0c
Det'
a)\
83
76
70
84 76
81
77
81
78 73
79
79 74
79
81
1)
83
ratt witl
hen,
b')"
a) The appropriate Ho
null
1on1
c)(
compute its value' b) Give the formula for the appropriate test statistic and Formula:
Space for work:
poi d)
Computed value:
wl
\
c) The corresPonding P-value is: Greater than 0.20 Between 0.10 and 0'20 Between 0.05 and 0'10 Between 0.025 and 0'05 E. Between 0.01 and 0'025 Less than 0.01
M
a)
-A.
D, a)
b( b)
c)
d) State your conclusion using cr: .05._write sentence that tells egl tns,rra;ce whether thera
ry
dl
e)
tt
paid policies' This sample yields a mean e) suppose ABI randomly samples.lO0 recently compute aWconfidence interval' of 77.7 years and a standard deviation of 3'6 yt*t' l plaie' , xx'xl with one decimal Report it in the
f"t;i;;.x
t-.-
'
MT2013
MT2013: Answer 1 a) c) !9ars of Employment, Teaching Rating b) d) 2,1,3,4. e) 5,3,4,1,2. Population : All toys produc-ed; sample 1 too tovsi Sampling Frame 5,000 toys; Parameter :3yo; Statistic :5o/o
B.
D.
Details and Comments: a) Years of Employment has units (yrs); Teaching Rating does not have units but the ratlns^is an average of ordinal data over a numbei of corises, and can range from I to 5 with fractional values possible. b) "Percentage of Canadian adults who work full-time" is measured at one time point, hence cross-sectional. The other variables are.rurrrroi.peatedly over time, hence longitudinal or time-series. c) only"weekly receipts at a clothing boutique" is measured at more than one time point. The other variables ut. *rururJd once each. d).(t) The five categories are strata;random samples are taken within each one. qil E^ach employee has the same chance of beinj setecteJror the sample. (iii) one school is a reasonable representative oith" entire university, hence a cluster. (iv) Choosing o'every fifth name', makes it systematic. tl Tlt sampling frame is the production run, namel y, thatpart of the population from which the sample can be drawn. :
'r
;'' \,':i I
MT2013: Answer 2 a) C. b) c)
E.
A.
d)
A.
e) B.
Details and Comments: a) categorical data are displayed with abar chart.Histograms, stem-and-leaf displays, boxplots (and usually line graphs) are for quantitative da"ta. b) 20% (142t700) c) 43% (150/3s0) d) 63% (1e6t30e) e) The column percentages for males are different from those for females, which suggests that cell phone brand preference and gender are rehtlJliindependent.)
"ot
SG-15
MT2013: Answer 3
Upper inner fence :66.64 (iv) B. (v) Decrease, Stay the same, Increase, Stay the same d) 15, 189, 138 e) E. c)
:20.24
D.
Details and Comments: a) Look at the formula for standard deviation. If all numbers are equal, then they are all equal to the mean, so all the deviations are zero. This is the only way the standard deviation can be zero. b) (i) The median is closer to Q3 than to Q1 so the distribution is skewed to the left. (ii) rQR: Q3 - Ql - 49.24 -37.64 (iii) Lower inner fence 37.64- 1.5x1 1.6; Upper inner fence 49.24 + 1.5x1 1.6 (iv) Yes, only on the right side of the distribution since the maximum exceeds 66.64. (v) Decreasing the lowest data value decreases the sum, and hence the mean. But it doesn't really affect which is the middle value or the quartiles. The range increases. c) Quartiles divide the area of the distribution into four equal sections. d) (i) Count up the number of data values. Don't forget to attach the leaf to the stem the maximum and median. e) Monthly sales are more variable in Vancouver compared to Toronto since the box i
tal1er.
MT2013: Answe3.4 a) (i) A. Negative, moderately strong b) Large N.g.; Large Pos.; Small
Details and Comments: a) (i) Top left to bottom right is negative association. (ii) Removing the lower left point reduces the scatter. b) 1. The older the car, the lower the price. 2. The taller the person, the heavier the person. 3. Height has no connection with IQ. c) 1. "Creative" but completely wrong. 2. The points must lie exactly on a shaight line with a positive slope. 3. "Creative" but also completely wrong. 4. Corr of X and Y: Corr of Y and X. The roles are interchangeable. 5. Two of the properties of r.
3rH:it"fl,;li;I;t
?).
b_o
MT2013: Answer 5 a)f =74.73 *3.19x b) _0.95 c).odecreasesby3.19,, it is an outliei since the residen, i,
-o*
d)g0%
e) 49
0.
= 59.5 -(-3 .Ig)(s,4):74.73) b) Reanange the formula ior b.r,'r b1$*/sr): (-3.1gx3 c) Interpretation of slope. d) y' G0.9r2:0.90 or 90%o
! - b$:
.3/rt.1)= -0.g5
e)
i(8) :74.73
-1.19_(8)
:49,2t
:
g) Since x is unknown, just use the mean ofy. h) Use the 68-95-99.7 liule, i.e, Sl,i * z(tt.t) :35.3,74.7
3"li*1;li;#l"t ;$r";#,
11;
68 _s
Mean: e5 ; SD = :J !i)
Details and Comments: a) First-year students' ages will vary only.slightly since most are within ayearor two in age. There might be some older students, i.e.-ttror. rrtu*1ig to school etc., but it is to have students who are much younger li4]y-*tikely than lg or 19! b) (i) Computations: pr(Z>A:0: z:0,r" X ="1ii)o=r 20:$* 0 => It:20 k(Z > z) = 0.1587 l,_so X : p + 26 30 20+ I o ::-= o I0 (ii) Computations: Pr(15 < X < 2t s-201trc <Z <lys,2'l/rc) 5<z < _ .0.s): 1 2(0.3osi; :0.:s: c) (r) uomputations: E(T*y_): E(X) + E(y) =15 + 80 : 95; var(X+Y) = var(X) lvar(i) Gince rosD :r/25 : 5
; jp{l :)
:) :
:f{f
e) Computations:
z-score for
z-score
19
sc-17
MT2013: Answer 7 (iii) A. 140'l% ,4'7 '2ohl a) (i)8,269 (ii) C. Increase the sample size b) 1. False;2. False;3. True;4. False;5. False;6. True
Details and Comments: :269 a) (i) n: (1.64s2)(0.46x0.54)(0.0s1 denominator of the 1ii;'ioot< at the formula for the CI. The sample size is in the etTor, so increasing the sample size decreases the margin of error. (iii) p :2291525 :0.4362;
90%C|:0,4362tI.645@:0.4362+0.0356or[0.4006,4?1
b)
1. The interval changes from sample to sample
2. Population parameters don't vary; sample statistics vary 3. Higher confidence requires wider intervals 4. CI; are not about individual data values; they are about estimates 5. All CIs for mean include the sample mean; only 95o/o include the population mean 6. Definition of a CI
MT2013: Answer 8 a) A. Hs: p:0.28 and Ho: P > 0.28 b) C.2.67 c) E. 0.003s d) A. e)
C.
f) B. 2000
Details and'Comments: a) One-sided altbrnative since the question asks whether the problem is'omore severe."
c) The P-value is the areato the right of 2.67 on a standard normal curve. d) Since the P-value is less than 0.05, the null hypothesis is rejected; the true population proportion is significantly higher than28%. e; nejecting the null hypotheiis for a two-tailed alternative is equivalent to the usual (two-sided) confi dence interval. 0 Sampling variability only depends on sample size, as long the population is large.
c)*1
value: t = #r: m:
!.597
d)u
c) C. Between 0.05 and 0.10 d) There is not sufficient evidence that the mean length of life of people who buy their policies is higher, so do not increase premiums. e) 177 .0 ,78.41
Details and Comments: a) One-sided alternative since the question asks whether policy-buyers are "living longer" than before. c) Use the t-table with 19 degrees of freedom d) Since the P-value is gteater than 0.05, do not reject the null hypothesis. 1.984x3.64h00 : 77.7 t 0.7
e)o
dont
surv
"j
ll,t*
Notes: This exam has 9 questions. The duration is 2 hours. Books, noteso and calculators are allowedo but not computers, cellphones or on-line connectivity.
MT20l2: Question
"A
sole
practitioner"
of online"transactions. To determine if this is the case, they plan to survey a sample of their regular .urio-.rr. a) Suppose that ASW's regurar customers belong to a rewards program and have a customer rewards ID number. ASw decides to rindomly seleci 10b numbers. This sampling plan is called: _A. Simple Random Sampling B. Stratified Sampling _C. Cluster Sampling _D. Systematic Sampling _E. Convenience Sampling
b) Suppose that ASW has an alphabetized list of regular customers who belong to their rewards program. After randomly selecting a custoirer on the list, every 25th Justomer from that point on is chosen.to:-b." in the sample. This sampling plan Simple Random Sampling Stratified Sampling .,i^, _C. Cluster Sampling _D. Systematic Sampling _E. Convenience Sampling
ASW, a regional shoe chain, has recently launched an online store. Sales via the Internet have been sluggish compared to their brick and mortar stores, and management suspects that its regular customers have concerns regarding the security
_A.
is called:
-8.
c)
as the
of the study.
which of the following is the parameter of interest in the ASw study? _A. All regular ASW customers % of regular ASw customers who have concprns about online security -B' ASW customers who belong to the rewards program _C. of ASW customers who belong to the r.*urdi program but don't shop online -D'%
E. None of the above
e) One member of the management team at ASW suggests that their survey could be done online. Customers logging on to the online storilwould
be asked to .o*pi.t. survey and offered a coupon as incentive to participate. Which statement is true? _A. This is a voluntary response sample _B. This would result in an unbiased random sample _C. This would result in a biased sample _D. Both A and B _E. Both A and C
tt.
SG.19
[IT2012z Question
'oPlanning
A brokerage firm gathered information on how their clients were investing for Here is a small sample of the data they collected.
a) Place an
X in the
Number
-Respondent
-Age
-Gender
-Household Self-directed
-Bookvalue
Based on age, clients were categorized according to where the largest percentage of their retirement portfolio was invested and shown in the table below.
t
a)
Total
64
82
londs fotal
T9
23
86
t02
42 188
b) The percentage of clients who are over age 50 and invest in mutual funds is: 8.33.3% _C. r8.1% _D.34.0% _8. s4.3% _A.
s3.t%
b)
c) Of the clients over age 50, the percentage who invest in mutual funds is: _D.34.0% _8. s4.3% _A. s3.t% _8.333% _C.
r8j%
d) Of the clients who invest in mutual funds, the percentage over age 50 is:
54.3%
e) The percentage
f) Consider the following side-by-side bar chart for the data below:
clllrtdYosrEEi, Ol(hr
-A.
-P.333% -C.
-D.34.0% -F.
54.3%
c)'
of
Yes
No
l0 l0
l0
0
d):
liltut ?fistsitln
ldryrr tbr
!1*i6
ldry fter
0{d!
MT2012t Question 3
Here is a histogram and the
'6Mmm
marketing managers.
Hrbgrrmof ld( lrllmger Sdarie:
filC$ l,|.nrgFr
8rhdc.
Min
46360
a) The shape
o1 69693
Median 77020
o3 9t750
Max 129420
right
')r'
b) Which of the
_A. _B.
following is true? Mode < Median < Mean Median < Mode < Mean Mean < Median < Mode Mean < Mode < Median
of the following is closest to the standard deviation? _A. $ 3,676 _8. $ 13,843 _c. $ 20,765 _D. $ 83,060 _E. Can't tell without the data
d) The
_c.
_E.
$69,693
_D.977,020
$14,566
SG.2I
MTi
e) ComPute the lower and uPPer
Space
for calculations
Tod
were recei
E
I
-C. -D.
No
L L
ilil;
-A. -B.
to parts (a) through (g) above. The next two parts are not related tuf.Jtlut"ut ftguttt 1$ tho"'uttds) for a discount The boxplots belolJ rt o* *o"tnfy (Atlantic' in three different regions of Canada office supply companywith locations Central and West).
i
The mean would increase The median would increase fn. range would staY the same The IQR would increase The IQR would decrease
Me,
l.srar
I @
a)\
b)'r
col:
#
--/ -g. -A. J.
I *H
c)1 the
Slc
Int
Eq
Sp,
h) Which of the following statements lt tfggt Central has the lowest sales revenues revenue Central has the lowest median sales revenue C. West has the lowest mean sales revenue sales -O. West has the lowest median etlantic has the lowest mean sales'
i'r
Which of the following statements is S!ry? A. West has the most variable sales revenues'
d)
$8
-g.
West has the largest tQR-'Central has the smallest IQR' -C. eUantic has the most variable sales revenues' revenues' -O. E. Central has the least variable sales
Er
SG.22
MT20l2: Question
To determine whether the cash bonus paid by a company is related to annual pay, data were gathered for 10 account executives at Outstanding Management Group lOivtC; wtro received cash bonuses in2007. The data and summary-statistics are shown b"to*.
ANNUAL PAY
$ 70,609 $ 58.487
CASH BONAS
$ tt,22s
$ 6.238
$ 104,s61
$ 43,922 $ 82.613 $ 116,250 $ 76.751 $ 68.513 $ 137,000 $ 94.469 Meun Stsndard Deviation
$ 8s,318
$ 14,194
$ 4,188
$ 11"863 $ r3,67t
$ 7,759
$ 20.760 $ s5,000
$ 34.368
$ 17,927
$ 15,618
$ 28.077 0.735
Conelation a)
;t, b) What would the correlation be if the Dollars were converted to Euros at the current conversion rate of (1 Canadian Dollar :0.76 Euros)?
c) Estimate the linear regression model that relates the response variable (cash bonus) to
Spacefor work:
d) From the equation, in part c), estimate the cash bonus for an executive at OMG earning $82,613 ayear' and compute the residual for this estimate.
Residual:
sG-23
e)Would you be confident in using your regression equation to estimate the cash for an executive at OMG earning $200,000 ayeafl
Yes
No
Reason:
f) Below is a plot showing residuals versus fitted values for the estimated regression
equation relating cash bonus to pay for the account executives at OMG.
(ruffi
v{rr{ll Fltr
ic
Cdr
bt!!I
b)w
to th
Circle the conditions for linear regression which are violated, if any. Noqe are violated Linearity
NormalitY'.-+'
c)c
Rou
Parts (g) through (i) are unrelated to parts (a) through (f): g) In commenting on the increase in home foreclosures (i.e. banks repossessing homes), news reporter stated "there appears to be a strong correlation between home forec and job loss of the head of household." Comment on this statement; use one sentence only.
d)l
h)A research study investigated the relationship between number of hours individuals spend on the Intemet and age. Which is the predictor variable? Circle your choice.
Hours on
s!
'(f
ii':
i
Internet
Age
_A. _8.
t
I
I I I I I I
I
I I I I
MT2012: Question
o6Greater
The Survey of Study Habits and Attitudes (ssHA) is a psychological test that measures academic motivatigl ano tt"iy il.uits. Females ,ror. t iglrrr, on average, than males. The oisttiuution of SSHA tn" r.-ate studenis at a university has mean r20 and standard deviation 28; thedistributlon among male sfudents has mean 105 and standard atlati"n 35' Scores are nonnally distribut.i *rurn. also that scores are independent.
t:;*l ;;;f
3.ffi:f-'ffiTi:r::ffi ;?.t#ave
62?
Report your
I I
I
ssHA score is exceeded by only 10% of female students? Round your answer to tne nearest whole number.
u) wtrat
L l' I'
l
I
I I
c) compute the lower and upper quartiles for the distribution of scores
of female students.
d) suppose you select a single female student and asingle male student at random and give them the SSHA test' what are the mean and the stindard deviation of the difference (female minus male) between their scores. Report to one oecimar place. Mean = Standard Deviation =
e) using your answers-from part d),compute the probab irity thatthe chosen female has a higher score than the chosen-male.'
SG-25
f) Suppose Angelina (a female) scores 78 on the SSHA, while Brad (a male) scores
the SSHA. Use an appropriate calculation to determine who did worse compared to average for their gender. Circle the name of the person who did worse.
Angelina
Explanation:
Brad
MT20l2z Question
66A
convenient trutho'
Part I. A convenience store owner suspects that only 10% of the customers buy
magazines and thinks that he might be able to sell something more profitable. In order to decide whether he should stop selling them, he tracks the number of customers who buy magazines on a given day.
a) On that day he had 300 customers. Assuming it was a typical day and that his estimate is correct, what are the mean and standard deviation of the number of customers who buy magazines each day? Report your answers to one decimal place.
Mean:
Standard
Deviation:
day?
c) How many magazine sales would you consider to be very strong evidence that his 10% estimate was too low. That is, what number of sales would be extremely unusually high? Hints: Use The Empirical (68-95-99.7) Rule. Remember to give a whole number answer.
Part II. Past records indicate that the magazines he sells on any day have an average revenue of $150 with a standard deviation of $30. Suppose he takes a random sample of 36 past days' sales receipts and records the dollar value of magazine sales.
a) Describe the sampling distribution for the sample mean by naming the model and telling its mean and standard deviation.
d) co rat
b) Suppose the resulting sample mean is $130. Do you think that this sample result is unusually small? Explain.
e)r
SG-26
MT20l2: Question
One division of a telecommunications equipment company reports that l2Vo of nonelectrical components are reworked. Management wants to determine if this perceniage is the same as the percentage rework for electrical components manufacfured by the
company. The Quality Control Department plans to check a random sample of the over 10,000 electrical components manufactured across all divisions.
a) The Quality Control Department wants to estimate the true percentage of rework
for
electrical components to within *4o/o,with99Vo confidence. How many components should they sample?
_A.6s1
of
_D. [ 0.0541 ,0.1499 ] _E. Cannot be deternijnEd with the given information.
c) The 95o/o confidence interval haqed on these data is 0.0742 to 0.1302. Which one the following is the correct interpretation? The percentage of electronic components that are reworked is between 7.4Y0 and I3.0%.
of
-A.
-8.
-C.
we
-D.
-E.
components are reworked. The margin of error for the true percentage of electrical components that are reworked is between 7 .4%o and 13.0%. All samples of size 450 will yield a percentage of reworked electrical components that falls within 7.4Yo and 13.0%. There is a 95Yo chance that 7 .4%o to 13 .\Yo of the electrical components have to be reworked.
d) Based on the 95o/o confidence interval, should the Quality Control Department conclude that the percentage of rework for the electrical components is lower than the rate of l2o/o for non-electrical components? _A. Yes, because the lower limit of the confidence interval is 7.4%. Yes, because l2o/o is contained with the 95o/o confidence interval. No, because 12% is contained with the 95%6 confidence interval. No, because the upper limit of the confidence interval is 13.0%. _E. We cannot say since the sample size is not large enough.
-B.
-D.
-c.
e)
All
_A. ...tighten the confidence interval _B. ...decrease the margin of error
_C. ...increase precision _D. ...increase the margin of error
will...:
E. ...increase the margin of error and tighten the confidence interval SG.27
MT2012: Question 8
654'
dip in chips"
A company manufacturing computer chips finds that 8% of all chips manufactured are defective. Management is concerned that high employee turnover is partially for the high defect rate. In an effort to decrease the percentage of defective chips, management decides to provide additional training to those employees hired within the last year. After training was implemented, a sample of 450 chips revealed only 27 with defects. Was the additional training effective in lowering the defect rate?
a) The appropriate Ho:
b) Give the formula for the appropriate test statistic and compute its value.
c) Assume that the value of the test statistic is -1 .4.Don't use your computed value from part b).The P-value associated with the given test statistic is closest to: _A. 0.0404 B. 0.05 0.0808 _D. 0.1616 0.9192
_c.
_8.
d) From the P-value in part c), and using a 1% significance level (i.e. cr: .01), which of , the following is _A. Conclude that additional training significantly lowered the defect rate. _B. Conclude that additional training did not significantly lower the defect rate. _C. Conclude that additional training significantly increased the defect rate. _D. Conclude that additional training did not affect the defect rate. No conclusion can be made with the given information.
true?
-E.
12: Question 9
large software development
6oThe
non-profit motiveo'
firm recently relocated its facilities. Top management has their professional employees to engage in local service activities. They that the firm's professionals volunteer an average of more than 15 hours per If this is not the case, they will institute an incentive program to increase it. A sample of 24 professionals reported the following number of hours:
l2 t3 t4 I4 t7 l7 T7 18
sample has a mean The correct
15 18
15 18
15
T6
I9
19
t6 t6 t6 I6 t9 20 20 22
of
16.7 5
f>15
p>15 p<15 p+15
p:15
_4. 3.572 _8. -3.572 *c. 1.327 -1.327 -D. 0.729 _8.
"r".
j.'.'
.11
c) Which
of the following conclusions is correct? _A. We reject the alternative hypothesis at the 5o/o significance level. We fail to reject the null hypothesis at the 5% significance level. -B. _C. An incentive program is needed since the evidence indicates professional employees volunteer an average of no more than 15 hours per month. _D. We reject the null hypothesis; the firm shouldn't need to institute an incentive program since the evidence indicates that professional employees volunteer an average of more than 15 hours per month. E. No conclusion can be reached about the hypothesis with the information that is given.
_ A. The data are a simple random sample from the population of interest _ B. The distribution of the sample data appears to.be approximately normal C. Volunteer hours is likely to be independent across employees _ D. All of the above
e) A95% confidence interval for the true mean number of hours of volunteer time is
closest to:
_c.
- END OF QUESTIONS;
MT20l2z Answer
a)
A.
b)
D. c) C.
d)
B.
e) E.
Details and Comments: a) Each regular customer has the same chance of being selected for the sample. b) Choosing ooevery 25th customer" makes it systematic. o'universe" for which you want to be able to generalize. c) The target population is the d) A parameter is a numerical characteristic such as a mean or a proportion/percentage. e) Since people can decide whether to answer or not, it is a voluntary response, and hence subject to bias. People who decide to participate may not be like people who decide not participate.
MT2012: Answer 2 a) Age, Household Income, Book value of portfolio b) C. 18.1% c)8.33.3% d) A. 53.1% e)E.54.3% f) Yes: The age distribution (ratio of younger to older) is about the same for each mode (i.e. type) of investment.
Details and Comments: a) Age (fq), Household Income ($), and Book Value ($) all have units and are measured on a continrium; so they are quantitative. b) 341188 :0.181 ,. c) 341102: 0.333 d) 34164: 0.531 e) l02ll88 :0.543 f) Look for differences across the clusters of bars.
n
a
d q
N.IT20l2t Answer 3 a) C. Skewed to the right b) A. Mode < Median < Mean d)8.$22,057 c) B. $ 13,843 e) Lower inner fence: $36,607.50; Upper inner fence : $124,835.50
L
I
P
a.
i)D.
b.
Details and Comments: a) Long right-hand tail: more of the area.is piled up to the left. b) The mode is the peak and it is clearly to the left of the median value of 77020.The median is less than the mean for a right-skewed distribution. c) Use the rule of thumb: s = Range/6 d) IQR: Q3 - Ql : 91750 - 69693:22,057 e) Lower inner fence :69,693 - L5x22,057 : $36,607.50 Upper inner fence :91,750 + 1.5x22,057: $124,835.50 f) The maximum is larger than the upper fence but the minimum is not smaller than the lower fence. g) The sum is increased so the mean is increased. h) The median is the line in the interior of the box. i) Variability is shown by the length of the box.
SG-30
ttt
c)
vt
a
e)
d)
At
a)
b) Unchangedat0.735
or
54%o
d) 9 e)
Residual
No; a predicrion at$200,000;.;;;r extrapolation beyond the range of data. f) c ons tant Variance (v- srrap e two variables are categorical, not quantitative, *o"rution is not appropriate. i) E. 0.00
t 6,e 68
+ 0.40s(A;,2' i il
:$
: 0.409; : _16,968;
I 6, 82 I
i:
_16,968 + 0.409x
ill;;
;;;ffi#ft;ltJl"_0,,
fi]ft:
*l
Details and Comments: a) This is the definition of r_squared. *ttlation coefficient iiut no u"its; it doesn't change if the measurement units ?f,:t c) straightforward application of least squares regression line formulas.
,U:'i;',Hff ;H*ii:T,*HiT","'-ilffi?il!.'i,n"p..IiJ.avrheresiduar
dG.ii;;;#,,
hn;i, il"Jr"rr"f.
Hours on rnrernet.
MT20l2: Answer 5
d)
"j angelini:-1.s,
162
i.izgior
b) Find the value of z 1.28; X 120 + l.2g(28):155g c) Find z-varues that have ui ui"u
is r.5 sDs above the average. Find the areato the right of
z-thathrt;;;;
the
1201/2g)
;ffJfffi:ize.
'since
* io.atsj?ti':
0.629301 0,63 0r 630/o f) Z-score for Angerina ='gs*tzolnls -t'.5; Z-scoreror erao = (70-105) r35 : -1.0; Angelina did worse relative to the reErence populations since her Z-score more negative.
[0-ts1ii+-.t1': p:(7>_,0.33)
: ffifrg
:
44.g
SG.31
= 5.2
: 1 - 2(0.1685;:0.663. c) From the Empirical Rule, 3 SDs above the mean is extremely unusual; f3o: 30 + 3(5.2) : 45.6. Sales of 46 or more would be extremely unusual.
Part II.
a) Normal: Mean
<Z<0.96)
b) Pr(f, < 130) --Pr(Z < [130 - 150]/5) :Pr(Z < -4) < 0001 There is an extremely small probability of getting a sample mean this small. Details and Comments:
150 and SD
:3011fi:5
Part I.
a) Use the mean and standard deviation of a count.
b) Use the normal sampling distribution of a count. (Note: Continuity correction was needed, but if you used it correctly you would get an answer of 0'71 1 .)
Part II.
a) Use the mean and standard deviation of a mean. (Note: The CLT applies here, not necessary to say this in the answer.) b) Use the,normal sampling distribution of a mean.
butitiJ
"
a)
ffi
.:..
sli
l'4T20l2t Answei:7
U,d
D.
b)
A.
c)
B.
d)
C.
e) D.
# isl
43 8
d)'
:01022 + 0.0368
d)t!
c) Notice the wording and the use of the term"95o/o confident". d) Values inside a confidence interval are likely values of the parameter. Evidence of a change or a difference depends on the target value being outside the CI. e) Examine the CI formula; a higher confidence level requires a larger multiplier/critical
le
c)
c.
p:27/450:0.06
1aa
:1.?::l?,ttlHt:ffi:l'."
confidence interval.
c) Find the
/\/n
m ln
arcato the left of -1.4 on the standard normal curye. d)Since the P-value is not less than-0.05 the evidenrc i, statistically
"ot
significant.
a)
MT2012t Answer 9 b) c) d)
B.
A.
D.
D. e) A.
f)Ho:p=15andHu:p>15.
:T:?#tt131::?$ve L,,1-....'''.-::-:?
' s/,ln
si'ce the question is abour "increasing" the volunteer time. <-^ 2.40/\m J'J tL
time
c) The P-value is much smaller than 0.05 so reject the null hypothesis. The volunteer is greater than r5 hours. so no incentive program is needed to get past r5 hours. d) These are rhe assumptions/condition;
e) 16.75
+2.069x2.40/\m:
ibr;
t6.75 + 1.016
";;;".pl.Ite.t.
SG.33
MT2011: Question
a)
At the beginning of the term we asked all Commerce 29I students to complete our line survey. This survey was most likely designed to be:
-A. a census of all C29I students -B. arandom sample of business students
-C. a random sample -D. all of the above _8.
of
2od
a random sample
of
aIl
C29l
students
b) The survey asked a wide range of questions. For each variable, circle the description which best describes the type of data the variable represents.
Ethnic
background
# hrs onlin4per
day
c) From the surveylresults, we can estimate that, on average, students spent 15.2 hours per week studying. This number seems high given that for a course load of 4 courses students spend 12 hours per week in the classroom and nearly half of the students reported doing paid work. What is the most likely explanation?
-C. -D.
-A. -8.
very skewed and the median is a better numerical summary the data are bimodal, the two goups are those that work and those that women study more than men none of the above
the data
are
d) Unfortunately, not every C291-registered student responded to the survey. If it were true that students who didn't respond also spend less time studying, then our estimate study time from the survey is:
-C. not a good estimate for study time of C291 students but -D. we can't say whether it is too high or too low.
-A. biased above the true average study time of C291 students -B. biased below the true average study time of C291 students
e) From the survey we find that the Commerce 290 Grade (call this variable, X) has a symmetric, bell-shaped distribution. Also, 95o/o of the grades fall in the range 53 to 93. Use that information to compute the mean and standard deviation of X. Report to at one decimal place.
Mean of
SD
ofX
MT2011: Question 2 "stock answers are sufficient here,, a) The following data are the price-to-earnings ratios (P/E ratio) for a random sample of 25 stocks traded on the NYSE. The data valuis have been sorted from smallest to largest.
1,g
39 The mean of these values is 19.0 and the standard deviation is g.5.
: Outliers: : Qr{ote:outlie,,u,"dffi!l1:*::*Tnooutliers,write..None'')
Inner
Median Ql Q3 IQR
: :::-
fenceg
ii) Is the distribution s5rmmekic or skewed? (Note: You do not have to draw a graph to answer this.) Circle your choice. Then give your reason.
Symmetric
Skewed
.1t
iii)
Sketch a boxplot of these data. Use the version based on the five-number summary; do not use the modified version using fences.
llrl only takes positive values, the distribution is syrnmetric. 2.lf the mean and median are equal, the distribution must be normal. 3. If the mean and median are equal, the mode must also equal the mean and median. 4. The SD and IQR are always equal for a symmetric distribution. 5. The SD ofa set ofdata values can never be zero.
SG-35
,6To-fu or not to-fu, that is the question'o MT2011: Question 3 Read the foilowing survey design plan and then answer the questions after it. , Get Healthy, o pridrr", of healthfoods conducts a survey of the Lower Mainland to determine how-recepttve itgh schiot students would be to its TOFU BURGH product and what market potential (sates) it could expect. It plans the survey as follows: i. From the tist of all schools in the area, tyvo groups are defi.ned, public and private high schools, called PUBS and PNS ii. From the PUBS, four schools are chosen randomly. iii. From the PklS, one school is chosen randomly' Student iv. In the PUBS schools selected, on io pdniclpate i' a vou uaG 'aroonly cieve Healthy'ooqs 6earch proieff give every one day, researchers pa!on Iyou lryourTOFIJ BUqCH loryour ThBnkyou. fifteenth student to exit the school a Twiggy osohealihy, self' a-stamPed, and TOFU BURGH 6typical iiledthe buy Get Healthy Foods addressed postcard (ike the one here). Marksting Research set researclters Department school, PRIS In the v. TOFU BURGH StudY to: give a l1.ll0 up a stand outside the school and 1236 S. E. Marine Drive Addrr..: Vancouvsr. BC postcard to the and BURGH TOF\J free T.l: any student who comes to the stand.
Dear Hioh School se ected been You will by Gel ponc $1.00 circle your choice below Bnd mail this pdst cerd before Apdl 30. 2002. Director ol MarkBtihg Having would TOFU BURGH. in
0
1
weBK
3 4 ot more
Belurn
Nan6:
a) The overall survey sampling design planned by the company can best be described
as:
-A. -B. -C. -D. -E. t; fn tne PUBS selected, the sampling design uses:
convenience samPling multi-stage samPling stiatified samPling simple random samPling clustei'samPling
- B. voluntarY response strategY - C. unacceptable bribery of students - D. anecdotal responses c) In the PRIS selected, the sampling design uses: - B. voluntarY response strategY - C. unacceptable bribery of students - D. anecdotal responses d) One parameter of interest is likely to be: _ _ _
e)
A. systematic samPling
b)
Ie
b,
A. systematic samPling
B. the number of high school students in the Lower Mainland C. the number of students who replied they would buy at least one TOFU BURGH in a tYPical week D. the proportion of students who replied they would buy at least one TOFU BURGH in a tYPical week
c). (m
ren
sm
;; j
which of the two samples is likely to have non-response bias? A. PUBS schools onlY
Yer
Ret
B. PRIS school onlY C. Both PUBS and PRIS schools D. Neither will have non-response bias
8{-i
MT2011: Question
how ironic,
is important in deciding how reliable survey results are. Hrre are the data on responses to
this
Response No Resnonse
Small 375
225 600
Medium
160
Larse
40
160
Total
240 400
200
(ii) How is non-response related to the size of the business? Use percents to make your
statement precise.
1f
b) Investment reports now often include correlations. Following a table of correlations among mutual funds, a report adds, "Two funds can have perfeit correlation, yet different levels of risk. For example, Fund A and Fund B may be pirfectly correlated,'yet Fund A moves 20o/o whenever Fund B moves I0o .'Explain to someone who knows no statistics how this can happen.
A study shows that there is a positive correlation between the size of a hospital (measured by its number of beds, .r) and the median number of days, y, thatpatients remain in the hospital. Does this mean that you can shorten a hospitai stay by choosing a small hospital? Explain your answer choice.
c)
Yes
No
Reason:
SG-37
MT2011: Question 5
point averages (GPA) of-its 1000 a) At a well-known business school the grade 2.84 Ad standard deviation 0'40' undergraduates are normally distributed"*ith ..utt 2'00 (i'e' "on probation")? (i) What percentage of the undergfaduates have GPAs below
Answer:
a) inr ev lor
in
ob ce an
to
thr
(ii)whatGPAwillbeexceededbyonly20ohofthestudentbody?
Answer:
Ql :
Q3=
IQR:
in<
Nc
$,
Scholastic Aptitude Test (SAT)' Ina b) Bart scores 725 onthe mathematics section of the with mean 500 and standard reference population, sAT scores are normally distributed Test (ACT) mathematics test; deviation 100. Lisa r.or., 33 on the Americutt Colltgt deviation 6' ACT score, ur. rror-utiy distributed with mean 18 and standard (i) What are the z-scores for each student?
Bart:
Lisa:
relative to the (ii) Circle either the name Bart or Lisa (above) based on who did better
reference poPulations.
MT2011: Question 6
a) To test the strength of building materials such as steel girders, engineers place increasing loads on the girders until they break. The pressure exerted by the load that
eventually breaks the material is call the 'strength' of the girder. Generally speaking, the longer the girder, the less the strength. Your company makes steel girders. The engineer in charge of testing tells you that he has tested 10 girders to breaking point and has obtained data linking the length of each girder (in metres) to its strength (in kg per square centimetre). But his computer crashed just after he ran a regression analysis on the data and all he can remember is the lengths of the girders and a few strengths. He did manage to record the means and standard deviations olall the lengths and sGngths and the r2 of
the regression, which was 0.719.
2 2
J
Lost Lost
91
J 4 4
5
77
Lost Lost
76
j.
Mean
SD
Lost
82.60
10.72
3.00
1.49
Note: The means and standard deviations are calculated for the ENTIRE data set, including those that are missing.
(i) What is the correlation between length and strength? Report to three decimal places.
from length.
Equation:
(iii) You notice that the purchaser of your girders requires the 5 m girders to support an average load of 75 kg per square centimetre. Do you feel confident your girders will do
that? Give a numerical rationale.
SG-39
b) What is the correlation coefficient for the following three points in the X-Y plane?
Answer:
d) An economist studied salaries of 321bank employees with five or less years of employment in a national bank. He found that the relationship between years of service and salary was linear and that the regression equation predicting salary (in thousands of dollars) was: Salary :2I.5 + 3.1 * Years. He concludes that employees with 10 years of service should make an average salary of $52,500. Is his conclusion correct? If not, say why.
n
4
\4
e) In part d) the economist has used the regression equation to make a prediction. Which of these numbers best measures the precision of this prediction?
-A. The standard deviation ofy (sr) -B. The standard deviation of x (s,) -C. The square of the correlation coefficient (r') -D. E. The ratio of the two standard deviations (s, /s")
f) An investigator measuring various characteristics of a large group of athletes found that the correlation coefficient between the weight of the athlete and the weight that the athlete could lift was r: 0.60. Determine whether each statement is true or false. Circle your choice. (i) If an athlete gains 5 kg, he/she will be able to lift an additional 3 kg. (ii) The more an athlete can lift, on the average the more
that athlete weighs. (iii) 36 per cent of the athlete's lifting ability can be attributed to his or her weight alone. (iv) 60 per cent of the athlete's lifting ability can be attributed to his or her weight alone. True
False
False
False
False
SG-40
MT2011: Question 7
o6Pack
An important part of the customer service responsibilities of a telephone company relates to the speed with which troubles in residential service can be repaired. Suppose that past data indicate that there is a probability of 0.70 that service troubles can be repaired on the same day they are reported.
a) Suppose the company receives 100 houble calls on aparticular day. What is the approximate chance thatS0o/o or more will receive same-day repairs,
b) Suppose it is also known that the repair time for a trouble call has a mean of 480 minutes and a standard deviatibn6f ZSO minutes. A random sample of 400 trouble calls was taken and the repair times recordpd. Compute the probability that the mean of the 400 repair times is less than 500 minutes.
SG.41
MT2011: Question 8
pn An established clothing retailer, CHAP, is interested in customer response to a new logo. A survey randomly samples 100 customers; 55 of them say they wo11ld it is the neri logo to the previous one. Ho*.,r.., CHAP will only change its logo if hal convinced that the newly designed logo is preferred by the majority (i.e. more than questions. of its customers. Based on this information answer the following
the proportion of customers who prefer the newly designed a) The sample estimate logo over the previous one is: A. 0.55
i,
_8.55
c.
100
_ _ _
c.
0.071
D.0.50
prefer c) The 95%,con_fidence interval for the true proportion of the customers who new logo over ihe previous one is closest A. 0.55 * 0.098 B. 0.55 + 0.98 c. 0.55 + 0.0049 D. 55 + 9.8
the i
'
to:
_ _ _
Fi
Ur
who d) How large a sample n would you need to estimate P,the proportion of people prefer the riewly designed logo over the previous one, with margin of error 0.05 with 99% confidence? Use the guess :0.5 as the value forp' A. 384
ter
de
c)
_8.664
_c.26
Hc
_D.271
e)
d)
test were conducted on these data, the test statistic would be 1.00. If the
If a hypothesis
uitr*uti*
_ _ _ _
hypothesis were one-sided, what would the P-value be? A. 0.0794 B. 0.1587 c. 0.3174 D. 0.8413
the hypothesis test in part e)?
Fo
Co
(sl
A. Customers definitely prefer the new logo B. Customers definitely do not prefer the new logo C. There is not enough evidence to say customers prefer the new logo D. There is not.ttough evidence to say customers do not prefer the new logo
e)(
MT2011: Question
You are the new Operations Manager of the local public transportation company and are especially interested in the reliability of bus service. You plan, on a monthly basis, to take a random sample of major bus stops and observe whether the buses depart on time or late and how late they are. (Buses never leave early since, if they arrive early, they wait until their departure will be exactly on time.)
a) The first month, you gather a random sample of l2l bus departures from a variety of times of day, days of the week, routes and locations. The sample has an average lateness of departure of 6.4 minutes with a standard deviation of 1.8 minutes. Which of the following is closest to a95oh confidence interval for the average lateness of departures for the entire bus system this month.
_A.6.4+0.029
_8.6.4
+0.271
Five years ago, the system-wide mean lateness of departure was known to be 6.8 minutes. Using a 5o/o level of significance and the sample results of part a), cany out a hypothesis test to decide whether the system is improving; that is, whether the mean lateness has decreased from five years ago.
c) The appropriate Ho:
d) Give the formula for the appropriate test statistic and compute its value.
right::>;
SG-43
f) From the P-value associated with this test statistic, which of the following is
_ _ -
A. Do not reject Hs atthe I}Yo significance level B. Reject Hs atthe I0o/o significance but not at the 5% significance level C. Reject Hs atthe Soh significance level but not atthelo/o significance level D. Reject Ho atthe to/o significance level
g) Using the 5o/o significance level, state your conclusion in that the bus company management can understand.
h) The distribution of lateness of departure is strongly skewed to the right. However, itis still appropriate to test the mean because:
_ A. The data are a simple random sample from the population of interest _ B. The sample size is large enough for the Central Limit Theorem to apply _ C. Since the sample is random, bus departures are independent of one another _ D. All of the above
..
,.1.
n
d)
tu fr
rl
BONUS: In what century did the "equals" sign first appear in print?
In it)
111
MT2011
G. 1900s
ilr;2011'Answer
c)
?tl::"?:#;,;:6ff#i,11f;;unt'Quantitarive;c2e0gradeeuantitative;
census.
of
(cm,%o,and hrs,
;i,ji"!:"*rtit4x;*::*{r#la*mwi,hahighnumberof
ir,i.r, are missing ror a reason flriiq;ff:,ff#f:'r",Xi::T.-lu'i.'",, ?,Iffi:ni:ff :i',.#T[l;:,'.g*1.,f#,#:1,Rure):73t2(10)
MT20lll. Answer Zi) Median: f Z, ei';13,_et ,ni,,ine iuiu.,
a)
D.iI;.';H i*t.*"o
= 24,
Ieft = 11.
(0,40.5).J There are no outriers. i, quit, Jin r.", from rhe median.
ril6il;.un
10
20
30
40
b)
are False.
a) r) With
fiTllg,T:i.t::ff g:.?6?iJf
,-?rti:T:Y, ii?Jh: sketch musr show
tt ts also acceprable to reporr ir-"r o oi*uy p/E the disrribu,il;;"#";; ro box and
;itn"
j.
*'.l.rt
rhe skewnes
iT"m;:f.ffi.-,Hfllj;*?1"
Nor
symmerric.
j. pere
MT2011: Answer 3 a) B or C; b) A; c) B; d) D; e) C;
Details and Comments: a) Both multi-stage sampling and stratified sampling are acceptable answers. Technically, multi-stage sampling is the preferred answer, since for PUBS, four schools are chosen randomly but the actual students are selected systematically. b) Since every fifteenth student is selected, the selection is systematic, not random. c) Since students are free to come, or not, to the stand, this is voluntary response. d) Counts are not parameters because they are not adjusted for sample size; however, proportions are parameters. e) Cards are handed out either to every fifteenth student or to volunteers; however, in each group not everyone who receives a card will mail the card in; that's non-response.
(i
(i
b
sl
c,
4 tt
e)
MT2011: Answer 4 a) (i) s2% (62s11200:0.52) (ii) Non-response rates are: Small: 37.5o , Medium: 60%o,Large: 80%. The larger the company the higher the expected rate of non-response. b) Correlation is not the same as slope. So a perfect correlation does not mean that the slope is 1, hence a I unit increase in x does not mean a 1 unit increase in y. c) No: Larger hospitals are more likely to take more serious cases requiring longer len$h of stay.
f Details and Comments: a) (i) Sum across the columns to get the row totals of 575 Respondents and 625 Nonrespondents. Then divide by the overall total of 1200. (ii) Column percentages are needed here, not row percentages. l,the slope is still the ratio of b) Remember the formula for slope: Ut ,(*). Even if
a)
th re
Bi b)
0 A
m
kp
r:
M
a)
the SDs, which need not be equal. c) Look for lurking variables to explain unusual or nonsensical correlations.
b)
MT2011: Answer 5 a) (i) Pr (X < 2.00) :Pr (Z < [2.00*2.84]10.40):Pr (Z<-2.10):0.0179 ot (ii)Z:0.84; X :2.84 + 0.84(0.40):3.18 (or 3.176) (iii) Q1 : 2.57iQ3 : 3.1 1; IQR : 0.54 Ql for Z:-0.675;X:2.84 + (-0.675)(0.40):2.57 Q3 forZ: 0.675;X:2.84 + (0.675X0.40):3.1t
17.9o/o.
De
a)
b) Bc
are
col
Z-score for Bart (725*500)/100 2.25:. Z-score for Lisa (33-18y6 :2.50; Lisa did better relative to the reference populations since her positive Z-score is higher.
Details and Comments: a) Remember to make sketches of the required areas so that you get the correct parts of the normal curve. In (i), standardize Xto Z and find the corresponding area; in (ii) and (iii), begin with the area, find Z and 'ounstandardize" to get X.
SG-46
tuIZ0ll:
a)
=!
d) No
ll,i,;*l r;l 1,3",JR b) Perfect negative correlation' : r c) r:0.46, unchanged (coneration -? tpr"t p o"i, o"irir, ,t .y fall on a straight is i'nuaeant
-predictionJut
rb.
-;.;;,^-'@!rv'rD'v
100.9
y1.r..w;s
illiq[J"tiffi
J:'X*iin:ajn/;;;"iH'i,u,,or,i,prov-.,,q
line.) to thi -"ur*"..nt scales.) extrapolation beyond the range of data(that is,
*";iJ;; #:1i#,','#ni;:,:'f#??fi ii;a;;*;;;#?"'.bu'dingmighrralrdown b) Remember to makg.a ptoruerore doing the calculations. | (i) is farse becausS;d* 9ir[r *ir'i.un;;;;i;; fift of 3 kg onry on averase. A gain of 5 kg mighl eiieaoiiti#itift
Details and Comments: a) The minus sign is vital; the correlation is negative since the longer the girder, the strength' If you rotgtitttto'iiu-rlign the lower yo* iarculations of the slope, inlercept regression equation wiit and be itr.oo..t urra you up concluding that 5 m girders
gr."i..lriil;;i".
;1,H:li;ti|)||ft:;""I1{'r'.pp."
MT201l: Answer 7 a)pr (p > o.8o) :pr
b) pr
some people and less than 3 on averase; (iiifuses trre oennition or,,; ri.,,t
(zrffil
(Z t_2^.1g):0.0145
:pr
(r
<
soo)
pr
(2.
:Pr(Z
jiffi
or
l.4So/o
or 94.5%
Details and Comments; yr. the sampling distribution of p. "rJ b) Use the sampling distribution of x- (i.,e. rele_r1ber the ,/i;nthe denominator). Both of ther. .ituuiionr a"p.nJo"
fr*lTr;;ough
it*r
SG-47
p:55/100
0.55
< 6.8
d)t:T":73/r[m
_2.44 -L.11
decreased)
h) B or D (either is
acceptable) :
f
J
Details and Cimments: a) Reason: ti26 t'.980: CI 6.4 + L 980(1 .811ffi) - 6.1* { b) Examine the effect of each of these by referring to the formula for the CI. c) This is a one-tailed altemative since the question asks whether mean lateness has decreased from five years ago. d) Remember the minus sign on the test statistic. e), 0 & g) Reject H6 since the P-value is less than 0.01. Remember to state your conclusion in a sentence that answers the original question. h) B is the most important of the three, but A and C are also needed for the test to work.
0.324
c),
his
o r
o o
card
Categorical Quantitative Neither Categorical Quantitative Neither Categorical Quantitative Neither Categorical Quantitative Neither
b) Credit card customets were divided into two groups: Canadian residents and visitors to Canada. The average amoirirt spent by all Canadian residents was $200. The average amount spent by all visitors to -Canada was $300. What must be true about the average amount spent by all customers? A. It must be $250 B. It must be larger than the median expenditure C. It could be any number between $200 and $300 D. It must be larger than $250
_ _ _
c) A sample of 500 cash sales had a mean of $20 and astandard deviation of $40. The histogram of the data would most likelybe: A. skewed to the left (i.e. long left-hand tail) B. approximately symmetric C. skewed to the right (i.e. long right-hand tail) D. bimodal
_ _ _
d) Which of the following is likely to have a mean that is smaller than the median? A. The salaries of all National Hockey League players B. The grades of students (out of 100) on a very easy exam on which most score very high or perfectly, but a few do very poorly C. The prices of homes in Vancouver D. The grades of students (out of 100) on a very difficult exam on which most score poorly, but a few do very well
_ _ -
SG.49
Hudson's Bav Com Freouencv Age (years) 15-19 2 10 20-24 25-29 19 27 30-34 16 35-39 40-44 l0 4s-49 6 50-54 5 3 55-59 2 60-64 Total 100
_ _ _ _ _
A. About 34 because about half are younger than3{ and half are older B. Above the median because the distribution is approximately symmetric C. Above the median because the distribution is skewed to the right D. None of the above
f) Based on the following figure, decide whether each of the statements below the is more likely to be True or False. (f{ote: House income means "total household and is referred to simply as "income" in the statements.)
350,000
o
E
.p
f
:
$ o
tso,ooo
L00,000
5o,ooo
0
BMW
Cadillac Lexus
Lincoln
Mercedes
Mercedes buyers have the highest variability in income. For each car type, the incomes are reasonably symmetric. There is a positive correlation between income and brand.
consider a standard normar random variabre, z, (i.e.with mean 0 and standard deviation 1)' Find the median, lower and upper quurtii., interquartile range
*o
(IeR) of
Median of Z:
(e3):
Interquartile Range:
of the median? That is, find the total percentage below "Median _ 1.5xIQR" or above "Median + 1.SxIeR".
scale (wAIS), a standard Ie test, are approximately normally distributed 6r all age gr"rpr, il;*.ver, the means and standard deviations of scores differ across different_ag. g.oupr. For the 20 to 34age group, the mean is 1 10 and the standard deviation is 25",''iil.'rorirt. 60 to 64age group, the mean is 90 and the standard deviation is 25. sarah is 29 and,her mother Ann is 62. sarahscores 135 on the wAIS while Ann scores 120. which of the two has the higher score relative to her age group? Explain your choice with appropriate calculations.
d) This part is unrelated to parts a), b) and c). scores on the wechsler aaurt rntettigence
Ann
Sarah
SG-51
MT2010: Question 3 "Contender for gender offender'r A university offers only two degree programs, one in Engineering and one in English. Admission to the programs is competitive, and a women's group suspects discrimination against women in the admissions process. They obtain the following data from the a applicants by gender and admissions decision. lassification of oI all universi a two-wav classltlcatlon Female Male 20 35 Admitted 40 45 Not Admitted
a) Is there evidence of an association between the applicants' gender and success in
b) The university replies that there is no discrimination. Ir its defence, it produces a three-way table that classifies applicants by gender, admission decision AND program ied. which Enslish Ensineerins Female Male Male Female 10 5 Admitted 10 30 Admitted' 30 15 Not Admitted 10 30 Not Admitted
Is there an association between admission rates and gender in either program? Explain why or why not.
to
c) Are the answers in parts a) and b) contradictory? If so, how can you explain the contradiction?
d)
o1
d) After disregarding gender, are admission rates different in the two programs? Support your conclusion with an appropriate two-way table (i.e. admission decision by program).
e)
m(
Dr
SG-52
Frolder Study
20
L5
-* ***--i"
-----
............................j........."...."..........^....i...,.. _- "i- -
i;i; !iii
-'"" - "-
- -^-*
o o
r*F
'i
10
5
ii
""' ^--
-i
'i'10
Weighr (kc)
15
2A
b)
c) Which of the following values is the correct correlation coefficient for this data? Note: You can reason this out without doing the calculation. _ A. r:0.5 B. r:0.975
_c.r:o
e) A joumalist reporting on this study claims that being heavier causes a frolder to grow more eyes. What is wrong with this statement?
f) Do you think these five frolders represent a random sample? why or why not?
SG.53
d) fo
e)
D1
800
500
;
E
400
200 0 400
s)l
wir
Wire (normaluse)
Rer
a) In this study, the response variable is: A. Corrosion rate for a dam wire
h)r
statement' b) Is linear regression appropriate here? choose the single best A. Yes, the scatterplot is straight enough B. No, there is not enough scatter C. No, there is too much scatter D. Yes, there are no outliers
_ -
B. Corrosion rate for a wire in normal use response C. Either rate; itdoes not matter which is considered the response variable D. Neither; the instrument used to measure corrosion is the
End
the regression line' c) Summary statistics are presented below. Use them to calculate to [email protected] places' Show the formulas and your work. Report your final answers r : 0'8691 ,sx I
:304.6667
: t96'4466
t:554.0000
sy:286'6104
SG-54
d) A new type of wire has a corrosion rate measure of 555. What does the model predict for the corrosion measure of this type of wire used at a dam?
e) one of the data points is (220,245). Whatis the value of the residual for this point?
g) Can the regression line be used to reliably estimate the dam wire corrosion rate for a wire which has a rate of 2500 mil under normal use? Give
a reason.
_Yes
Reason:
_No
h) Fill in each blank with the letter of the ending that fits best.
_.
(ii) If the units are changed for both x and,y variables, (iii) If the units are changed for just the x variable,
(iv) If a constant is added to the y variable,
Endings:
A' "'the slope will change but the averages B. ...s, will change but ! will not change.
C. ...the data will be normally distributed.
will not
change.
D. ...only the correlation will change. E. ...the correlation, slope, and standard deviations will remain the same. F. ...the correlation and slope will both change.
G. ...the slope will change, and s" and s, will also change.
SG-55
b) Complete silver medals (i.e. medal plus ribbon) weigh 38 grams on average with a standard ddvia{ion of 5 grams. Find the mean, variance and standard deviation of a pair of complete medals (gold and silver) combined.
$
c) You were instructed to assume that the weights of the gold medals, silver medals, and lengths of ribbon are all independent. Is this a reasonable assumption? Explain why or why not in one brief sentence at most,
r
d) In some winter Olympic events, such as the snowboard parallel giant slalom, the winner is the rider with the best combined time over two runs. In some summer Olympic events, such as the javelin throw, the winner is athlete with the best single distance out of four tries. Generally speaking, does the sum of two random times or the maximum of four random distances have greater variability? A. Sum of two random times B. Maximum of four random distances C. Cannot say because time and distance are unrelated variables
be expected to be
Approximately what percentage of all chocolate bars produced by this machine would
between24} iaZqegrams?
A quality control manager initiallyplans to take a random sample of size n fromthe production line' If he were to double his sample size to 2nt, thestandard deviation of the sampling distribution of the sample mean x would ue mutiiptied by:
b)
_D.2
* C. \n
c) The quality control man$ger plans to take a random sample of size n fromthe production line. How big should n be so that the sampling distribution of i-has standard deviation 0.3 grams?
_ _ c. 1000 D' Cannot be determined unless we know that the population is normal. B. 100
manager takes a random sample of nine chocolate bars from the production line, what is the probability that ilt. ru*pi. weight of the nine sample chocolate bars will be less than240 grams? d)
_A.
10
*;;
_A.0
SG.57
MT2010: Question 8 rrshooters for the shooters?'r A radio talk show host with alarge audience is interested in the proportionp of adults in his listenin g arcawho think the drinking age should be lowered to 18. To find out, he ooDo you think that the drinking age should poses the following questions to his listeners: be reduced to 18, in light of the fact that 18-year-olds are eligible for military seryice?" He asks listeners to phone in and vote "yes" if they agree the drinking age should be o'noo'if not. Of the 100 people who phoned in, 70 answered "yes". lowered and
a) The sample estimate, B, of the proportion of adults who think the drinking age should be reduced is:
_ A.70 _ B. 0.70 _ c. 0.69 D. Not able to be determined from the information _ _ _ c. 0.0021 _ D 010045
ri
given
c) The margin of error for a90%o confidence interval is closest to: A. 0.046 B. 0.075 c. 0.090 D. 0.690
_ _ _ _
d) How luge asample n would you need to estimate p withmargin of error 0.01 with 95% confidence? Use the guess :0.6 as the value forp. A. 6768
_8.9220 _c.9502
D. 9596
e) Which of the following assumptions for inference about a proportion using a confidence interval are violated in this case? A. The data are a simple random sample from the population of interest B. The success/failure condition C. A third choice of no opinion needed to be included D. There appear to be no violations
o ti ri
_ -
ri:
fl:tr'il:"lui.|.t#:"Jr"i|;H.:"1ffi r'uu"*ot;Hffi
a) Give the appropriate
oosiarservicer,ui".r,*g.o
b) Give the formula for the appropriate test statistic and compute its varue.
t"l?:d;trJ:;il:x^;;:tj["r;ti"J;lfr
sie;hcrr;;i#;;;" ve! uv! _ D. Reject lroat the t%o,ifiiinrun..
i;r;i
fl B' Reject Ho atttr. rol' c. Rejecr,,Hp ar the s%
ri*in.iir.-u"t;;;,h.
I;;:J*3;orthero,,owingisco*e*?
5% significance level ther%significance rever
"y8lf"ff
l"lli:"t#;ir'Jj*rn:T
_ A. 7.0 l.0.2 _ B. 7.0 + 0.4 _ C. 7.0 x 2.0 _ D. 7.0 + 4.0
;Tf:;'#Jj"tT::ff*ffi
Ring I
Bonus Question: Just for Fun and Bragging Rights over the r 7 davs winter.orympics you saw the olympic rings rogo "111r times. In the officiar countless logo, not rr,. roro version, each of the five
.rgr.-.il; t;;,i,#
RG'
Ring 4
RG'
SG-59
MIDTERM EXAM
2O1O:
MT2010: Answer 1 a) Quantitative; Categorical ; Cate gorical; Neither b) c c) C d) B e) (i) 10% (ii) B f) True; True; False
(iii) c
Details and Comments: a) Although the text considers an identifier variable, such as a Visa credit card number, atrypeof Jategorical variable, it is useless in that form; it is best thought of as Neither. You aren't likely to do any analysis on the Visa card number! b) The average must lie between the minimum and maximum, but depending on ,k.*n.ss it could be smaller or larger than the midpoint or median. c) The minimum value is 0 but the maximum can be very large, hence right-skewed. d) A11 except B are likely to have a long right-hand tail, where the mean exceeds the median.
e)
cumulative count to 58% (2+20+19+27). f) Incomes are not exactly symmetric, but for all practical pulposes and especially for data analysis, they certainly are reasonably synmetric.
(i) (5+3+2)1t00: llYo interval fiii +it of values (2+20+lg) are less than 30; including the 30-34
I I
increases th
1l
MT2010: Answer 2
a)
.:..
1\
2xpr (Z > 2.025) 2x0.0215 : 0.0430 or fuout 4o/o .j fn. boxplot is s;rmmetric around 0, with the ends of the box at Ql and Q3 at '0.675 and0.675 (from putt u). Since Zhasno limits, the whiskers can't extend to the minimum and maximum. Instead, use inner fences; the whiskers should extend to -2.7 and2'7 ' d) Ann has a higher rank. : Ann's z-score: (120-90) 125: L2; Sarah's z-score (135-110)125:1
Median:0;
Qli:
1.35
Details and Comments: a) Z is symmetric so the median equals the mean. It is acceptable to report answers to two decimal places: For Ql: -0.68 or -0.67; for Q3: 0.68 or 0.67;for IQR: L36 or 1'34 b) If you used IQR of 1.36, the probability is 0.0414. If you used IQR of 1.34, the probability is 0.0444. c) Since the distribution is unbounded, any reasonable choice of whiskers is acceptable.
b)
c)
d) e)
Dt c) d)
of English students of either sex are admitted. c) The English pro{?m is harder to get into, and that is where more females applied. This is an illustration of Simpson's parado"x. d)
MT2010: Answer 3 a) Yes: Percent of males admitted:35/g0 :0.4375 0r 43.75yo percent of Females admitted :20160:0.33 o, Ziilo b) No: Half of engineers of either sex are admitted. one-quarter
Ensineerino
English
40 40 80
l5
45
Row Total JJ
95 140
Details and Comments: when a two-way table is provided, it is useful to add the row totals and the column totals. They are needed to compute conditionar prouuuilitirr. ii.!ron', paradox is one of the most revealing illustrations of the need to dig deeper tt. relationship between categorical variables' *3,-Tt*nt appearto be the result for a two-w ay tablemay well be reversed when a third rl4riable is incorporated.
il;
MT2010: Answer
..i
f15
20
E10
e0
s
v5
0t020
Weight (kg)
d) Yes; there is a clear linear relationship e) Correlation does not imply causation. D No. They were the slower ott*, o. tt easier ones to catch.
"
Details and Comments: c) Since the correlatio.n is strong and positive, only 0.g75is a sensible choice for r. d) Conelation coefficients requlre relationships. lj4iar
SG.61
:167.683 + 1.268(555)--871.423 (or 871.424) :167.683 + 1.268(220):446.643 (or 446.644) e) 9 Residual : e:245 - 466.643 : -20L643 (or -201.644) :0.86912 :0.755 D ,t g) No; this is extrapolation far beyond the range of data. h) (1) A (ii) G (iii) B (iv) E
a)A b)A c) h - r (*) : 0.86e1(2 86.6r04ns6.4466) : r.268 bo = ! - bfi: 554.0000 - 1.268(304.6667): 167 .683 (ot 167 .684) 9 : 167.683 + 1.268x (or f: 167.684 + 1.268x)
d)f
MT2010: Answer 5
Details and Comments: a) Response variable is on the vertical axis. c) Beware of round-off error. Carcy all available decimal places in the intermediate calculations, but report fewer as instructed. d) Simple substitution e) Use the definition of residual: observed minus predicted. f) This is the definition of r-squared. g) Although it is mathematically correct to substitute 2500 into the regression equation, extrapolation far beyond the range of data is a major misuse of regression. h) Examine"th6,.formulas for slope, intercept and correlations and test out the effect of the suggested changes.. For (ii), correlation does not depend on units, but slope and SDs do change if both variables change. For (iv), the scatterplot is simply moved straight up, so SDs, slope, and correlation are not affected.
.1
MT2010: Answer 6 a) Mean (X-Y) : Mean (X) - Mean (Y) :48 - 8 :40 Var (X-Y) : Var (X) + Var (Y) : 36 + 4:40
SD (X-Y) r|fr.:6.32 b) Mean (X+Y) : Mean (X) + Mean (Y):48+38 Var (X+Y) Var (X) + Var (Y) :36 * 25 :61
7.81 c) Yes: Heavier ribbons are not expected to be found only on heavier medals.
: SD (X+Y) : rfif :
: 86
d)A
Details and Comments: a) and b) The variance of a sum or difference of two independent variables is always the sum of the individual variances. Remember that calculations are not done with standard deviations; combine variances first and then take the square root. d) The sum of two random variables generally has greater variability than a single oomean" of two measures random variable. However, if the question had asked about the rather than the sum, then the mean would have lesser variability than a single measure.
ir
p(
d,
th
rl
9
Br
BI
MT20l0: Answer
'
240)
rin.i","#"d;i?;1,[XliffJ*,1l]l:l'.rro, * rj:
-3)
: 0 00r 3
liif,'.T.#:oi,ntmffi
dardizationuses
(240,246)
a)B
MT20l0: Answer
b)
B c)B d)B i
:70/100
o/16 = 3Ni.
aJ Keason:
: 0.70
d) Reason: = e) The data arc'a""nuJni.iri. .iioiii,ro.ot21= e220 since peopte cnoose "'vw vvupro choose whefher or not to phone in! MT20l0: Answer a) He:p =7.5;Hu:
fr&?i;;ffi:soo46 , (i.66
p*7.5
9 ..F
il;JJ
d)
c
'rD]
have worked for
lepn
o statistic is negative, uaurln'tt"rrfr""gi,fri'uutr* t-ta6le you rook up the t;;*i1,]it[test O Since the p-value" i, r.otirin;rr.r " the null hypothesis rl?r-0r,, at the 5%o revet;but since
") p.ositive
jrj;j*,:_.",,,""
cneg*-
TO MIDTERM
2O1O
SG.63
year and go back in time. Questions in each topic area are afiarrged from the most recent explanations/comments and answers fifowing the questibns in each topic area is a set of about the answers. The comments give details of calculations and cofirmon errors made
by students.
,.:.
Since the teaching'of any course is dynamic and always undergoing change, there may still be some terminology or notation or even a few parts of questions which are unfamiliar to you. If you are unclear whether a particular question or topic is relevant to the current year, please ask your instructor.
Al (MT2,099:9rl
6oNot
a)
Male
Female
Can,t
tell
Same size
'?.H;llT;*ft?i:?1,flil:**,y3#f,:f' 2122n30
c) For each of the three measures below,
:il3,l:'illl:'.f
;i.,ffi
in the numerical value in the blank provided or none , o r rhe s e (c ircr e one
Value:
Is a measure of;
Shape
Shape
Cenhe
Spread
None
Centre
Spread
None
Shape
Centre
Spread
None
is: (circle
Symmeffic
Symmetric
SG.65
f) The mean male age is 22,5 years. One of the members of the male team is 22 years old and has a z-score of -0.25. What is the standard deviation of male ages?
g) If we assume that male ages are normally distributed, what proportion of males on the team are 22 years ofage or younger?
h) Which of the following is the best justification for the assumption of normality made in part g)? (Check the best response) A. The Law of Large Numbers B. The Central Limit Theorem C. Least squares regression D. None of the above
_ _ -
i) Team members are required to take a course in the history of underwater basketweaving. The professor records the values of several variables for each student. These variables are listed below. For each one, decide whether it has been recorded as quantitative or categorical.
Score on the final exam (out of 200 points)
j) Universities
across North America require underwater basket-weaving students to take quantitative a skills test. Percentage scores on this test have a mean of 30% and a standard deviation of l0%, Give a range within which you would expect to find the middle 95o/o of all North American underwater basket-weaving student test scores.
sl
ol
In
A1
u
IQ
;1f,:TJ;"tljl.* lts em
_".u;au-ysli
E
*-r^
/f
I IZJ4
of the dara set in which cyberStat corporati on records information Surname Age Gender Salary Job Type Srnith 39 remale $62,100 MAttn".o^onJ Jones ?7 Male $47,350 Chan 27 Female $zs.zso utencal W'ono 48 Male s / /,600 Management
variables below which are recorded as quanrirarive scate variables Gender
tffifJni:tr;""r1:te
EmPloyee
Job Type b) Three small Statistics classes all took the same test. Histograms of the class are shown below. scores for each
Class
Surname Age
Salary
Class 2
Class 3
I
f
a
I
5
I
I I
4 3
40 50 60 7U 80 90
100
50 60 70 80 s0
J00
;l
40 50 60 70 80 90
10
(i)
:f:ffiH,-jj.tffi;sr'iffi;)
Individuar incomes in the united Age of male heart
c) For each of these variables, decide whether its dishibution is more likely symmetric or skewed r.ft reft-hand ta') circre one "r
Which class had the highest mean score? Which class had ild;;;, median
;
3
states Symmetric Skewed right skewed reft attackvictims symmetric Skewed right skewed reft Lifetimes of electric light bulbs symmetric Skewed right Skewed left IQ scores of the canadian population Symmetric skewed right skewed left
sG-67
Question A3 (MT2008-Q2)
ooA
Nash-ional Game"
Obs#
1
The data set to the right contains all the point differentials or margins in all NBA games played by the Phoenix Suns up to February 13 of the 2007108 season. Negative numbers indicate losses, positive numbers indicate wins. The data have been arranged in ascending order for you (biggest loss to biggest win).
a) Compute the various numerical summaries and put them into the table below part b) under "original data." Some have been computed for you.
)
J
a
4
6 7
8
9 10
11
NOTE: Part b) is not part of the current curriculum. You can ignore it. But think of it as a challenge question. It is easy to figure out. Instructions are given in the Answers/Comments. b) Suppose the data undergo a transformation such that tr : 2X - 3, where X:
the original variable and,X* is the transformed variable. Find all of the numerical summaries forX* and put them into the table below under 'otransformed data".
Original
Data (X) Transformed Data (X*)
oonew,"
t2
13
t4
15
16 17
l8
t9
20
Mean
5.6
2l )J
23
Median
Range
Q1
24
25
26 27
28
Q3
IQR
Std dev
29 30
II,7
31 32
JJ
34
c) Are there any outliers? Use the "inner fences" definition of outliers and the original data (not the transformed data) to identify any outliers.
35
36 37
38
39 40
4t
42
43
44 45
46
47 48
49
50
51
52
ffitfit:il
firfffl07-er)
uData,data,
a) A sample of shoppers at a mall was asked the following questions. Decide whether type of data are more likely the to be quantitative or categoricai. lCircte your ctroice; What is your age (in years)? Categorical euantitative How much did you spend (in $)? Categorical euantitative What is your maitalstatus? Categorical euantitative avaitability of parking. T:" Categorical euantitative (Excellent, Good, Fair, poor)
,*
b) Here is a table of sources of electricity in canad a andthe uS and the percentage of electricity generated by each. c"".ttr.i; bar graph to !v wv'rP *-pur. canada and the uS. Do NoT use separate sets of axes ro..u.r,
gr;;:'
league salary of $2.36 ;il;h"k, ;;;; or"."*i*, That is, is $2.36 m'lion the mean or median salary for"rVna
mitlion'" which wordshould g" A study was made
119-made more than the
th
e 41r
piuy..r?
9f followins;.'":'Lt$:ll to u.ir,.,t*Jurai.uiutionz
d)
_8. _C.
1 year
5 years
SG.69
e) The following histogram displays the December 2000 percentage unemployment rates in the 50 U.S. states and Puerto Rico. The labels on the horizontal axis should be interpreted as follows: the bar labelled "1" represents rates of |.0% to I.9%o, the bar labelled '02" represents rates of 2.0% to 2.9To, etc.
24
I
20
*ro ?h
14
o18
o12 310 EA 2B
4 2 0
12345678
ffiffi
a
Unsmployment Rate
Curve
Curve B
(i) Under the given assumptions, which of the two curves better represents the distribution of prices of homes sold in the past few months? Circle your answer choice.
Curve A Curve B (ii) A potential buyer offers to give you the mean, the median or the mode of the prices of all the homes sold in the past few months in your neighborhood. Assuming that the density curve is the one you chose in (i) directly above, which numerical measure would you prefer? Circle your answer choice.
A: B:
Mean Mean
Median Median
Mode Mode
t"
are told that the mean price of 50 houses sold is $700,000. However, you notice that there was a mistake in the calculation, and that one of the buyers paid $500,000 instead of the $800,000 that was used when making this calculation. What is the actual mean price of the 50 houses sold?
(iii) You
s.
sl
sl
SG-70
: 27, None
d) Symmetric e) Skewed to the right (22-22.5)l(-0.251:2 -0.25 (29-22.5)lo, so D g)Pr(Z < -0.25): 0.4013 h) D. None of the above i) Quantitative, Categorical, Quantitative, Categorical j) Empirical (68-95-99.7) Rule: 30 + 20 (10 , (Also accept 30 + 19.6) Note: Parts g), h) and j) are about "Sampling Distributions and the Normal Model". Check your notes or the textbook.
Z:
o:
50)
Details and Comments: a) Boxplots do not show sample sizes; they only show: min, Q1, median, Q3, and max. b) Since the age distnbution for females is shongly skewed to the right, the mean is greater than the median. The median (from the graph) is 22, so the mean must be a little larger, hence 23. Note that 30 is close to the maximum and far above Q3 so it is not a realistic estimate of the mean. c) IQR (Males) : 24 - 2I :3;50'n p. (Females) : median : 22; Oldest Male max: 27 f) Use the formula for standardiring Xto Z; however, here both the values of XandZ arc given and it is the value of o which is unknown. h) The Central Limit Theorem cannot be used as the reason here since the sample is unlikely to be large.
Answer to Question A2 (MT2008-Q1) a) The quantitative variables are Age and Salary. b) Answers:3,3,3,1.
c) Answers: Individual incomes in the United States Age of male heart attackvictims Lifetimes of electric light bulbs IQ scores of the Canadian population
Skewed right (long right-hand tail)
Skewed left (long left-hand tail) Skewed right (long right-hand tail) Symmetric (equal tails)
Details and Comments: a) Gender and Job Type are categorical; Employee # and Surname are simply strings and used as identifier variables. Taking the mean of the Employee # would not make sense. b) Class 3 has much more area to the right than Class 1 or Class 2 so the mean and median are also shifted to the right. And since the histogram for Class 3 shows the greatest skewness, it has the greatest difference between mean and median. Class 1 is less spread out (the tails are both smaller than in the other two classes) so it has the smallest standard deviation.
SG-7I
c) Incomes are skewed right because fewer people have very large incomes, more people have incomes at the lower end or middle. Age of heart attack victims is skewed left because heart attacks are much more likely in older people. Lifetimes of bulbs are skewed right because most bulbs last the amount of time they are engineered to last but some will last much longer; that is, quality is designed in. Only a few will fail early. Lifetimes in general are skewed right.
Mean
5.6
8.2 10 110
Median
Range
6.5
5l
-3
11
Ql
Q3
-9
19
IQR
Std dev
t4
TI,7
28
c) Lower inner ferrce - -3 - 1.5(14) -24 Upper inner fence 11 + 1.5(14):32 Observation numbers of outliers : 52
23.4
Details and Comments: Note that the question asked for the observation number(s), not the margin! For part b): Suppose the data are transformed (linearly) as follows X* : a + bX; that is, multiply the original observations by oob" and then add "a". That shifts all the values ofX up or down by the amount o'a" and changes the size of the unit of measurement by'0b". Mean(X*): a bxMean(X); Median (X*) a + bxMedian(X); Range(X*) bxRange(X); lthe effect of ooa" is cancelled] a * b"QL(X); Q3(X): a + b"Q3(X); IQR(X): bxIQR(X); [the effect of o'a"'is cancelled] SD(tr): bxSD6); fthe effect of 'oa" is cancelled]
QIf):
SG.72
b)
Sources of Ectricity
80
70
Eources of Eec{rlclty
60
t50 I
.30
fl
i.o
m
Nudotr Nallrel
Cs
c) "Of the 4ll players on National Basketball Association rosters in February 1998, only 139 made more than the leagr.p MEAN salary of $2.36 million." If it were th! median, then half of the 4r1 players (i.e. 205 or 206) would exceed the value.
d) I year is the typical difference in age between entering first-year university students.
(i) 5/51 :0.098, so 9.8%. It is also acceptable to round to l0o/o. (ii) The median is in the 3.0-3.9 interval, so the median is best estimated as the midpoint of that interval at3.5%o. Comment: It is also acceptable to give the range 3.0-3.9. It is not acceptable to estimate
e)
f) (i) Curve B (ii) If you chose Curve A: Mean If you chose Curve B: Mode Note: The two choices offered in part (ii) are to give you a chance to get the correct answer to part (ii) even if you made the wrong choice in part (i).
SG-73