Home Assignment - OSTA
Home Assignment - OSTA
Exercise 5:
a) Relative and percent frequency distributions
25% 24%
20%
Frequency (%)
20%
16%
15% 14% 14%
12%
10%
5%
0%
Percent frequency
b)
c)
Name Frequency Relative Percent
Frequency Frequency
Distribution Distribution(%)
Garcia 6 0,12 12
Brown 7 0,14 14
Jones 7 0,14 14
Williams 8 0,16 16
Johnson 10 0,2 20
Smith 12 0,24 24
30%
24%
25%
20%
Frequency (%)
20%
16%
14% 14%
15% 12%
10%
5%
0%
Percent frequency
12%
24%
14%
20% 14%
16%
Range of 10 Frequency
30–39.9 4
40–49.9 9
50–59.9 3
60-69.9 1
70-79.9 1
80-89.9 1
90-99.9 0
100-109.9 1
10
9
9
8
7
6
Frequency
5
4
4
3
3
2
1 1 1 1
1
0
0
Frequency
The ranges of number of passengers (million)
c.
We can clearly see that most of the airport have the amount of 40-49.9 million passengers, with 9,
the second famous type is airports that have the number of passengers around 30-39.9 million, the
third type is 50-59.9 and 60-69.9, only 4 airports in these two cases, and the final one is 70-79.9 80-
89.9 and100-109.9, which types only contains 1 airport each. 90-99.9 group is the only group that
has no airport.
Exercise 41:
Male Female
a)
b) older people are more likely to have hypertension than younger people.
c) At younger ages, women have a less chance than men to get hypertension, but it completely
contrasts when they are older.
Exercise 57:
a.
200.000
184.500
180.000
160.000
140.000
120.000 115.262
97.102
100.000
80.000 71.233
60.000
46.339
40.000
28.716
20.000 15.004
5.284
931
0
2013 2015
b.
Region 2013 2015
China 7,04% 37,88%
Western Europe 33,44% 32,62%
United States 45,59% 20,38%
Japan 13,48% 8,19%
Canada 0,44% 0,93%
Total 100,00% 100,00%
Data on electric plug-in vehicle
sales in the words top markets in
2013 and 2015
c. part (a) more insightful because it can compare not only the sales among countries in that year but also
the sales of products in a country between 2013 and 2015. In contrast, more over, it is showing many
details and provide a better illusion .part B can only make a comparison between the sales in 2 year but
only one country toghether, not the 5 countries at the same time because it does not show us the based.
Chapter 3:
Exercise 9.
a) Mean = 2.5475
b) Median = 2.265
3𝑡ℎ+4𝑡ℎ
c) First quartiles = Q1 = 25 * (n+1) / 100 = 3,25 => This set is even, so I took the =2,03
2
9𝑡ℎ+10𝑡ℎ
Third quartiles = Q3 = 75 * (n+1) / 100 = 9,75=> This set is even, so I took the =2,735
2
Exercise 33.
∑x
a) Mean =µ= =76 (2017 data)
𝑁
∑(x−µ)2
Standard Deviation = 𝜎 = √ =2.070197
𝑁
∑x
Mean = µ= =76 (2018 data)
𝑁
∑(x−µ)2
Standard Deviation = 𝜎 = √ =5.264436
𝑁
a. Difference: Both the two years have the same mean but the 2018 scores have higher variation as
compared to 2017 scores.
Exercise 53.
25𝑡ℎ+26𝑡ℎ
a. Median: this is even sets, so I took =13,9%
2
c)
Min -13,9
Median 13,9
Max 117,1
1st 4,1
Quartile
IQR 25,7
3rd 29,8
Quartile
L Bound -34,45
U Bound 68,35
d) To find the outliers, we compare all the values with the L and U bounds, if it exceeds one of the two
bounds, then it is an outlier.
Using countifs syntax, I can find 3 values that are exceeded the bounds.
e)
Exercise 63
100
90
80
70
4-yr Grad. Rate (%)
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90
Admit Rate (%)
a)
𝐶𝑜𝑣(𝑥,𝑦)
b) The sample correlation coefficient:𝜌𝑥,𝑦 = 𝜎 𝜎 =-0,760352079
𝑥 𝑦
Chapter 4:
Exercise 17:
a) Call L is the event the design stage is over budget
=> L={(4,6), (4,7) , (4,8)}
b) The probability that design stage is over budget
2 4 6
P(L)= P(4,6)+P(4,7)+P(4,8)= + + = 0.3
40 40 40
c) Call M is the event that construction stage is over budget
=>M= {(2,8) , (3,8) , (4,8)}
d) The probability that construction stage is over budget
2 2 6
P(M)= P(2,8)+P(3,8)+P(4,8) =40+40+40= 0.25
e) The probability that both stage is over budget
6
P = P(4;8) = 40= 0.15
Exercise 27
Denote that the probability that a randomly selected U.S adult uses social media is A
Denote that the probability that a randomly selected U.S adult between 18-29 is B
a) P(A) = 1- P(US adult who do not use social media)= 1 - 0.35 = 0.65
b) P(B) = 1 - P(US adult who is 30 and more)= 1 - 0.78 = 0.22
c) P(A ∩ B) = P(A) + P(B) - P(A U B) = 0.65 + 0.22 - 0.672 = 0.198
Exercise 35
a. Develop a joint probability table
Respondent I am My Spouse We are Equal Total
b. Comment: we can clearly see that the percent of “ we are equal “ is the same on both husband and
wife, but the number of both of wife and husband say I am is far more larger than my spouse and we are
equal
c. Denote X is the event that the husband feels he is better at getting deals than his wife.
0,275
P(X) = = 0.5483
0,502
d. Denote Y is the event that the wife feels she is better at getting deadls than her husband
0,287
P(Y) = 0,503= 0.5765
e. Denote Z is the event that response “My spouse” came from a husband
0,126
P(Z) = 0,236= 0.5336
f. Denote V1 is the event that response “We are equal” came from a husband
Denote V2 is the event that response “We are equal” came from a wife
0,101.
P(V1) = = 0.5
0,202
0,101
P(V2) = 0,202= 0.5
Exercise 45
a. Denote X is the age of a randomly selected person in the United States
P(X ≥ 65) = 1 - 0.228 - 0.614 = 0.158
b. Denote Y is the age of a randomly selected person who is uninsured in the United States
P(uninsured) = 0.228*0,051 + 0.614 * 0.124 + 0.158 * 0,011 = 0.089502
0.158∗0.011
P(Y ≥ 65) = = 0.0194
0.089502
Chapter 5:
Exercise 27
X/Y 1 2 3 Total
1 0,14 0,13 0,01 0,28
2 0,11 0,21 0,18 0,5
3 0,01 0,05 0,16 0,22
Variance: Var[x] = E[𝑥 2 ] − 𝜇 2 =(12 *0.28 +22 *0.5 + 32 *0.22 ) - 1.942 = 0.4964
Variance: Var[y] = E[𝑦 2 ] − 𝜇 2 =(12 *0.26 +22 *0.39 + 32 *0.35 ) - 2.092 = 0.6019
𝑣𝑎𝑟(𝑥+𝑦)−𝑣𝑎𝑟(𝑥)−𝑣𝑎𝑟(𝑦) 1.6691−0.4964−0.6019
d) Covariance: 𝜎𝑥𝑦 = = =0,2854>0
2 2
Since the covariance is positive, we can conclude that the quality rating increases, so does the meal
price.
𝑐𝑜𝑣(𝑥,𝑦) 0.2854
e) Correlation : cor(x,y) = = = 0.4 < 0.5221 < 0.6
√𝑣𝑎𝑟(𝑥)∗𝑣𝑎𝑟(𝑦) √0.4964∗0.6019
As the correlation of these two values is positive, we can conclude that this is not strong, but moderate.
There is nearly no chance to find such a low cost but high-quality restaurant because the correlation is
positive, it means that as the price increase so does the quality. We can only find it when the correlation is
negative
Exercise 39
b)
10
c) In 15 second we have 𝜆 = 4
= 2.5 ( because
Exercise 57
1
N=15, n=3, r= number of restaurant exceed 50 dollar is *15= 5
3
10𝐶3 120
a) The probability of no restaurant will exceed is : = = 0,2637
15𝐶3 455
10𝐶2×5𝐶1 225
b) The probability of one restaurant will exceed is : = = 0,4945
15𝐶3 455
10𝐶1×5𝐶2 100
c) The probability of two restaurant will exceed is: = = 0,2198
15𝐶3 455
5𝐶3 10
d) The probability of three restaurant will exceed is = = 0,022
15𝐶3 455
Chapter 6:
Exercise 7.
(12000−10000)
a) 𝑃(10000 ≤ 𝑋 ≤ 12000) = (15000−100000) = 0.4
(14000−10000)
b)𝑃(10000 ≤ 𝑋 ≤ 14000) = (15000−10000) = 0.8
c) To maximize the probability that you get the property, the bid should be 15000$ because the its gonna
be 100% sure
d) I would not bid less than the amount of 15000 because I have to make sure that I will be successfully
accepted, then I will sell the land to the person that is willing to pay higher.
Exercise 21.
With a mean of 100 and standard deviation of 15, thus the score a person have to qualify for Mensa:
𝑋−𝜇 𝑥−100
𝑃(𝑋 > 𝑥) = 0.02 1 − 𝑃(𝑋 ≤ 𝑥) = 0.02 𝑃(𝑋 ≤ 𝑥) = 0.98 𝑃( < )
𝜎 15
𝑥−100
CDF table: = 2.054 <=> 𝑥 = 100 + 15. (2.054) <=> 𝑥 ≈ 130.75
15
=>IQ score=131
Exercise 29.
a)𝑃 (𝑋 ≥ 12) = 𝑃 (𝑋 = 12) + 𝑃 (𝑋 = 13) + 𝑃 (𝑋 + 14) + 𝑃(𝑋 + 15)
15! 15! 15!
= . 0,7112 (1 − 0,71)15−12 + . 0,7113 (1 − 0,71)15−13 + . 0,7114 (1 −
12!(15−12)! 13!(15−13)! 14!(15−12)!
15!
0,71)15−14 + . 0,7115 (1 − 0,71)15−15 = 0,3268
15!(15−12)!
b) Mean=np=150*0.71=106.5
standard deviation: √np(1 − p) = 150 ∗ 0.71(1 − 0.71) = 5,55
99,5−106,5
P(X>=100-0.5)= 1-P(X<99.5)=1-P(Z< 5,55
= 1 − (1 − 0,8962) = 0,8962
c) The advantage would be that using the normal probability distribution to approximate the binomial probabilities
increases or decreases the number of calculations.
The advantage would be that using the normal probability distribution to approximate the binomial
probabilities makes the calculations less or more accurate.
d) Statistics would definitely use the 6.3 section, because it can compute the probabilities in a very wide
range, with a large population, while in section 5.5, we have to divide into many cases and later on
combine it all together.
Exercise 37.
mean=3.4 days=𝜇
1 1
a) 𝑃(𝑋 ≤ 1) = 𝐹(1) = 1 − 𝑒 −𝜆𝑡 = 1 − 𝑒 𝜇 = 1 − 𝑒 3.4 = 0.2548
−3 −2
b) 𝑃(2 ≤ 𝑋 ≤ 3) = 𝑃(𝑥 ≤ 3) − 𝑃(𝑥 ≤ 2) = (1 − 𝑒 3.4 ) − (1 − 𝑒 3.4 ) = 0.1415
−5
c) 𝑃(𝑋 > 5) = 𝑒 − 𝜆𝑡 = 𝑒 3.4 = 0.2298
Chapter 7:
Exercise 15
∑𝑥
We find out that n = 12, and the point estimate of mean is =
𝑛
87+91+86+82+72+91+60+77+80+79+83+96
= 82
12
∑(𝑥)2
∑𝑥 2 −
𝑛
Point estimate of standard deviation is : 𝑛−1
968256
872 +912 +862 +822 +722 +912 +602 +772 +802 +792 +832 +962 −( )
12
= =92.9
11
Exercise 25
100 2
a) 𝑋̅~𝑁(533, 90 )
100
standard deviation of sample mean: = 10.54
√90
523−533 543−533
Standardize : P(523< 𝑋̅<543) = P( <Z< ) = P(-0.95<Z<0.95)=
10,54 10,54
=[P(Z < 0.95) - P(Z>-0.95)]*2 = (0.8295 - 0.5)* 2 = 0.6578
100 2
b) 𝑋̅~𝑁(527, 90 )
100
standard deviation of sample mean: = 10.54
√90
517−527 537−527
Standardize : P(517<𝑋̅<537)= P( <Z< ) = P(-0.95 < Z < 0.95)=
10,54 10,54
=[P(Z<0.95) - P(Z<0)]*2 = (0.8295 - 0.5)* 2 = 0.6578
c) The different values in part a and b is the mean, but when standardized, it's the same, its mainly
because the range is not different in the two cases, and also the standard deviation.
Exercise 35
𝑝(1−𝑝)
𝑝 ∼ 𝑁(𝑝, )= (0.3, 0.0021)
𝑛
b) The probability that the sample proportion 𝑝 between 0.2 and 0.4 is
0,2−0.3 0,4−0,3
P(0.2<𝑝<0.4) = P( < Z < 0,0458 )= P( -2.18< Z <2.18) = (0.9854 - 0.5)*2=0.9708
0.0458
𝑝(1−𝑝)
𝑝 ∼ 𝑁(𝑝, )= (0.3, 0.0021)
𝑛
c) The probability that the sample proportion 𝑝 between 0.25 and 0.35 is
0,25−0.3 0,35−0,3
P(0.25<𝑝<0.35) = P( <Z< )= P( -1.09< Z <1.09) = (0.8621 - 0.5)*2 = 0.7242
0.0458 0,0458
Chapter 8:
Exercise 9:
1050500
Sample Mean = 𝑥 = 55
𝑥 = 19100
The mean damage that results from fires caused by care careless use of tobacco is 19100:
For 95% confidence level, the confidence interval will be:
𝜎
95% CI = 𝑥 ± 𝑍0.975
√𝑛
3027
= 19100 ± 1.96
√55
= 19100 ±800
According to the confidence interval, the average estimated price of fires resulting from the use of tobacco
is somewhat more costly than that of the average cost of flames results from across all factors that cause.
Exercise 17:
Sample mean =𝑥= 6.34 (excel)
Standard Deviation = 2.16285903 (excel)
d.f=50-1=49 => 𝑡0.975 = 2
Confidence interval:
𝑠
95% CI =𝑥 ± 𝑡0.975 𝑛
√
2.16285903
= 6.34 ± 2
√50
Exercise 41:
a. The sample proportion is 1995 is p = 0.639
The sample size is n = 1200
At 95%, the margin of error of the number of eligible people under 20 years old who had a driver’s
license in 1995 is
𝑝(1−𝑝)
e = 𝑧0.975 √ 𝑛
0.639(1−0.639)
= 1.96√ 1200
= 0.027
At 95%, the interval estimate of the number of eligible people under 20 years old who had a driver’s
license in 1995
95% CI: 𝑝 − 𝑒≤𝜋 ≤ 𝑝 + 𝑒
0.639 - 0.027≤ 𝜋 ≤0.639 + 0.027
0.612 ≤ 𝜋 ≤ 0.666
b. The sample proportion is 1995 is p = 0.639
The sample size is n = 1200
At 95%, the margin of error of the number of eligible people under 20 years old who had a driver’s
license in 2016 is
𝑝(1−𝑝)
e = 𝑧0.975 √ 𝑛
0.417(1−0.417)
= 1.96√ 1200
= 0.028
At 95%, the interval estimate of the number of eligible people under 20 years old who had a driver’s
license in 2016
95% CI: 𝑝−𝑒 ≤𝜋 ≤ 𝑝−𝑒
0.417 - 0.028≤ 𝜋 ≤0.417 + 0.028
0.389 ≤ 𝜋 ≤ 0.445
Therefore, the confident interval is (0.389;0.445)
Exercise 9
b) The p-value:
Looking into the CDF table:P(Z=-2.12) = 0.017
c) Since p-value = 0.017< 𝛼 = 0.05=> we reject null-hypothesis at 5% level
d) with 𝛼=.05, we can easily find c through the CDF table: -1.645
Then we compare if 𝑍𝑐 > 𝑐, 𝑑𝑜 𝑛𝑜𝑡 𝑟𝑒𝑗𝑒𝑐𝑡
𝑍𝑐 < 𝑐, 𝑟𝑒𝑗𝑒𝑐𝑡 , therefore, we see -2.12<-1.645
=> We reject the null-hypothesis at 5% level
Exercise 17
a)
Exercise 41
a) H0 : p ≥ 71%
Ha : p < 71 %
𝑝−𝜋0 165/300−0.71
b) zp= = = -6.107
𝜋 (1−𝜋0 ) 0.71(1−0.71)
√ 0 √
𝑛 300