0% found this document useful (0 votes)
14 views

Practise Set 5 (2)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Practise Set 5 (2)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Practise set 5

Lec 30-41
Proofs Required
• Mean, Variance of Bernoulli, Binomial, Geometric,
Negative Binomial, Poisson distributions
• Mean and Variance of Uniform and Exponential
Distributions
• Least square estimation
• Bayes Theorem
• Mathematical Explanation of Monty Hall Problem
• Chebyshev and Markov inequality
• Central limit theorem, Weak law of large numbers
• Sampling distribution
P1
• Two balls are selected at random from a box containing
3-red, 2-green, 4-white.
• If X and Y are the number of red balls and green balls
respectively, included among the two balls drawn from
the box, find
• Joint probability distribution of X and Y
• Marginal Probability of X and Y
• Conditional distribution of X given Y = 1

Solution in lec 30 workout uploaded in LMS


P2
• A program consists of two modules. The number of errors, X, in
the first module and the number of errors, Y , in the second
module have the joint distribution,
• P(0, 0) = P(0, 1) = P(1, 0) = 0.2, P(1, 1) = P(1, 2) = P(1, 3) = 0.1,
P(0, 2) = P(0, 3) = 0.05. Find
• (a) the marginal distributions of X and Y ,
• (b) the probability of no errors in the first module, and
• (c) the distribution of the total number of errors in the program.
• (d) find out if errors in the two modules occur independently.

Solution in lec 30 workout uploaded in LMS


P3
• and 0 elsewhere.
for X>0,Y>0

• is a joint density function, then


• Find k.
• Marginal densities of X and Y.
• Check whether X and Y are independent.

Solution in lec 31 workout uploaded in LMS


P4
If two variables x and y have the joint
pdf
, for
=0 , elsewhere
• Find the probability that x will assume a value on
the interval 0 to ½ given that the value of y is ½.

Ans. 5/12
Solution in lec 31 workout uploaded in LMS
P5
Construct a 95% confidence interval for the population mean based on a
sample of measurements
2.5, 7.4, 8.0, 4.5, 7.4, 9.2
if measurement errors have Normal distribution, and the measurement
device guarantees a standard deviation of σ = 2.2.

Ans. [4.74,8.26]
• Do the same question for 90% confidence and 99% confidence.

Ans. [5.02,7.97], [4.19,8.81]


Solution in lec 32 and 33 uploaded on LMS.
P6, P7
2. In order to ensure efficient usage of a server, it is necessary to estimate the mean
number of concurrent users. According to records, the average number of
concurrent users at 100 randomly selected times is 37.7, with a standard deviation σ
= 9.2. Construct a 90% confidence interval for the expectation of the number of
concurrent users.

Ans. [36.19,39.21]

3. Installation of a certain hardware takes random time with a standard deviation of


5 minutes. A computer technician installs this hardware on 64 different computers,
with the average installation time of 42 minutes. Compute a 95% confidence interval
for the population mean installation time.

Ans. [40.775,43.225]
Solution in lec 32 and 33 uploaded on LMS.
P8
If an unauthorized person accesses a computer account with the correct
username and password (stolen or cracked), can this intrusion be detected?
Recently, a number of methods have been proposed to detect such
unauthorized use. The time between keystrokes, the time a key is depressed, the
frequency of various keywords are measured and compared with those of the
account owner. If there are significant differences, an intruder is detected. The
following times between keystrokes were recorded when a user typed the
username and password:
.24, .22, .26, .34, .35, .32, .33, .29, .19, .36, .30, .15, .17, .28, .38, .40, .37, .27
seconds
As the first step in detecting an intrusion, let’s construct a 99% confidence
interval for the mean time between keystrokes assuming Normal distribution of
these times.

Ans. [0.24,0.34]
Z Test – Sample
P8
The number of concurrent users of a
internet service provider has always
averaged 5000 with a std dev. Of 800.
After an equipment upgrade, the average
number of users at 100 randomly selected
moments of time is 5200. Does it indicate,
at a 5% level of significance, that the
mean number of concurrent users has
increased? Assume the std dev. Of the
number of concurrent users has not
changed.
Z Test - Sample
Problem
We have to test the null hypothesis

H0 : µ = 5000
against a one-sided right tail alternative
HA: µ > 5000

Because we are only interested to know


if the mean num of users has increased.
Z Test - Sample
Problem
Step 1: Test statistic
σ = 800, n = 100, α = 0.05, µ = 5000,
x̄ = 5200
The test statistic is
=
= 2.5
Step 2: Acceptance and Rejection
region

The critical value zα = z0.05 = 1.645


Z Test -
Sample reject H0 if Z > 1.645
Problem accept H0 if Z ≤ 1.645

Step 3: Result
Our test statistic Z = 2.5 belongs to
the rejection region, therefore we
reject the H0
P9
• Previously, Bennett University reported that students spent 4.5
hours per week, on average, downloading movies from torrents.
The IT department now thinks that, currently, the mean is higher.
Fifteen randomly chosen students were asked how many hours per
week they spend downloading movies. The sample mean was 4.75
hours with a sample standard deviation of 2.0. Conduct
a hypothesis test. Is the mean time actually higher for 95%
confidence level?
P10
• A particular brand of tires claims that its deluxe
tire averages at least 50,000 miles before it
needs to be replaced. From past studies of this
tire, the standard deviation is known to be
8,000. A survey of owners of that tire design is
conducted. From the 28 tires surveyed, the
mean lifespan was 46,500 miles with a standard
deviation of 9,800 miles. Using α=0.05 , is the
data highly inconsistent with the claim?

Source: Introductory Statistics Book by Barbara Illowsky and Susan Dean


Source: Introductory Statistics Book by Barbara Illowsky and Susan Dean

P11
• From generation to
generation, the mean age
when smokers first start
to smoke varies. However,
the standard deviation of
that age remains constant
of around 2.1 years. A
survey of 40 smokers of
this generation was done
to see if the mean starting
age is at least 19. The
sample mean was 18.1
with a sample standard
deviation of 1.3. Do the
data support the claim at
the 5% level?
Source: Introductory Statistics Book by Barbara Illowsky and Susan Dean

P12
• The cost of a daily
newspaper varies from
city to city. However, the
variation among prices
remains steady with a
standard deviation of
20¢. A study was done to
test the claim that the
mean cost of a daily
newspaper is $1.00.
Twelve costs yield a
mean cost of 95¢ with a
standard deviation of
18¢. Do the data support
the claim at the 1% level?
T distribution –
Sample P13
If an unauthorized person accesses a computer account
with the correct username and password, different
methods like time between keystrokes, the time a key
is depressed are measured and compared with the
account owner. If there are noticeable differences, an
intruder is detected.

The following times between keystrokes were recorded


when a user typed the username and password:
0.46, 0.38, 0.31, 0.24, 0.20, 0.31, 0.34, 0.42, 0.09, 0.18,
0.46, 0.21 sec

Construct 90% confidence interval for the mean time


between keystrokes assuming normal distribution
T distribution –
Sample problem
Sample size n = 12
Sample mean = 0.3 sec
Sample std. dev. s = 0.1183
Critical value tα/2 = t0.05 = 1.796

Then, 90% confidence interval for the mean time is


x̄ ± tα/2 s/
= 0.3 ± (1.796)
= [0.2387, 0.3613]
T Test –
Sample P14
(b) An authorized user takes 0.2
sec between keystrokes. One
day following data is recorded
as someone types the correct
username and password.
0.46, 0.38, 0.31, 0.24, 0.20,
0.31, 0.34, 0.42, 0.09, 0.18,
0.46, 0.21 sec
At a 1% level of significance, is
this an evidence of an
unauthorized attempt?
• Collect the data.
• Define the null and the alternate
hypothesis (b) We have to test
• Choose the significance level
• Find critical values
H0 : = 0.2 vs HA : ≠ 0.2
• Find the test statistic At α = 0.01
• Conclusion

Computing t statistic

T Test –
Sample t = 5.8565
problem
Acceptance region : [-3.106,
3.106]
We used T-distribution with 12-1
T Test – = 11 Df and α/2 = 0.005 because
of two-sided alternative
Sample
problem Therefore, we reject the null
hypothesis and conclude that
there is a significant evidence of
an unauthorized use of that
account.
P15
• For the time-independent Markov chain
described by the picture below, what is its 2-step
transition matrix?
Example: Solution
• For the time-independent Markov chain
described by the picture below, what is its 2-step
transition matrix?
• Once people arrive in Thailand, they want to
enjoy the sun and beaches on 2 popular
islands in the south: Samui Island & Phangan
Island.
• From survey data, when on the mainland,
70% of tourists plan to go to Samui Island,
20% to Phangan Island, and only 10%
remain on shore the next day.

P16
• When on Samui Island, 40% continue to stay
on Samui, 50% plan to go to Phangan Island,
and only 10% return to mainland the next
day.
• Finally, when on Phangan Island, 30%
prolong their stay here, 30% divert to Samui
Island, and 40% go back to mainland the
next day.
• Starting from the mainland, what is the
probability (in percentage) that the travelers
will be on the mainland at the end of a 3-day
trip?
Solution
• Once people arrive in Thailand, they want
to enjoy the sun and beaches on 2
popular islands in the south: Samui Island
& Phangan Island.
• From survey data, when on the mainland,
70% of tourists plan to go to Samui
Island, 20% to Phangan Island, and only
10% remain on shore the next day.
• When on Samui Island, 40% continue to
stay on Samui, 50% plan to go to
Phangan Island, and only 10% return to
mainland the next day.
• Finally, when on Phangan Island, 30%
prolong their stay here, 30% divert to
Samui Island, and 40% go back to
mainland the next day.
• Starting from the mainland, what is the
probability (in percentage) that the
travelers will be on the mainland at the
end of a 3-day trip?
Solution
• Let the first column of the transition matrix indicates destination
of the mainland, the second indicates Samui, and the third
indicates Phangan, while the rows correspond in the same
fashion. Then we will write the transition matrix as:

Hence, starting from mainland and ending at mainland, there is a


probability of 0.229 (noted at first column and row).
Example: Solution
Alternate Solution

• The tourist is starting from the mainland initially. So, the initial
prediction is . So, the prediction of the end of day 3 will be ,
Where P is the transition matrix.
• will have 3 components, where the first component will tell the
probability of being at mainland, second will tell the probability
of being at samui and third will tell the probability of staying in
Phangan. So, the required answer will be the first component.
• Note: If the tourist starts from samui, then initial prediction will
be . Then we will do all the predictions. Similarly for starting at
Phangan
P17
• Fit the following data in straight line using LSE
(1,3), (2,4), (3,5), (4,7).

Ans. Y=1.5+1.3x
P18
• Fit the following data in straight line using LSE
(0,-1), (2,5), (5,12), (7,20).

Ans. Y=-1.138+2.896x
P19
• Fit the following data in
z=a+bx+cy
using LSE
(0,0,2), (1,1,4), (2,3,3), (4,2,16), (6,8,8).

Ans. Z=2+5x-3y

You might also like