Prob Dist Updated-wps
Prob Dist Updated-wps
D. PHARM PROGRAMME
There can be two types of probability distributions. These are the continuous
probability distribution (e.g., Normal distribution) and the discrete probability
distribution (e.g., Bernoulli distribution).
•
QUESTION
i) What is the probability that a student attends lectures at least 3 days in a week?
ii) Calculate the expected value of x.
iii) Calculate the standard deviation of x and determine the interval one standard deviation about the mean.
Solutions
2) (a) Discrete random variables are counted whilst continuous random variables are measured.
(b) (i) 0 ≤ p(x) ≤ 1
(ii) ∑p(x)i = 1
c) xi Pi =f(x) x*p X^2 X^2 *p
0 0.20 0.00 0.00 0.00
1 0.25 0.25 1.00 0.25
2 0.30 0.60 4.00 1.20
3 0.15 0.45 9.00 1.35
4 0.10 0.40 16.00 1.60
1.00 Σxp(x)=1.70 4.40
i) p( at least 2 cars sold) = 0.55
(ii) µ = Σxp(x) = 1.70
(iii) Var(x) = σ2 = 4.40 - 2.89
Var(x) = σ2 = 1.51
σ = 1.22882 = 1.229
(iv) (µ - σ or µ+ σ)= = ( 0.47, 2.93 )
d) xi pi x*p X^2 X^2 *p
1 0.10 0.10 1.00 0.10
2 0.15 0.30 4.00 0.60
3 0.35 1.05 9.00 3.15
4 0.20 0.80 16.00 3.20
5 0.20 1.00 25.00 5.00
1.00 3.25 12.05
i) p( at least 3 days in a week) = 0.75
(ii) µ = 3.25
(iii) 12.05 - 10.5625
1.49
σ = 1.22066
σ = 1.22066
(iv) µ - σ or µ+ σ
( 2.03, 4.47 )
QUESTION ON BUILDING PROBABILITY FREQ.
DIST. TABLE
• A newly married couple plans to have three children. They are curious
about whether their children will be boys or girls.
a) Illustrate the expectations of the couple on a tree diagram and from
the expected outcomes, build probability frequency distribution table for
the random variable X, the child being a boy.
(b) Calculate the mean and standard deviation of the distribution.
(c) Determine the interval two standard deviation about the mean and
illustrate the interval on a probability frequency distribution graph.
d)What is the probability that the couple will have two boys?
e)What is the probability that the couple will have two boys and a girl?
f)What is the probability that the couple will have two girls?
Tree diagrams
1st child 2nd child 3rd child Experimental outcomes
B …………………… BBB 1/8
• B
• 1/2 G G ……………………. GBG 1/8
• G B …………………… GGB 1/8
• G ……………………. GGG 1/8
SOLUTION
• P(2boys)= 3/8
Special probability distributions
The Binomial Distribution
Two of the most widely used discrete probability distributions are the binomial and Poisson.
The binomial probability mass function B(n,x,p,q) ==
ncombinationx = n!/((n-x)!x!)
(equation 6) provides the probability that x successes will occur in n trials of a binomial
experiment.
3) A Bernoulli trial can result in a success with probability p and a failure, q = 1-p. Then the pr
distribution of the binomial random variable X, the number of success in n
Independent trials is b(x;n,p) = pxqn-x , x = 0,1,2,…,n.
i) If 6 readings are taken, find the probability that the particle is detected more than 3 times
ii) If 6 readings are taken, find the probability that the particle is detected less than 3 times.
iii) If 6 readings are taken, what is the expected number of particles detected?
iv) If 6 readings are taken, what is the standard deviation of the number of particles detected
Answer
3(a)(i) i) There are n repeated trials.
ii) they are independent.
iii) Each of the trials have two outcomes ie success and failure.
iv) A successful outcome remain constant from trial to trial.
a (ii) n= 6, p= 0.6, q= 0.4 15 6 1
For x > 3, p(x>3) = (6 ) * (0.6)^4 *(0.4)^2 0.1296 0.078 0.047
4) 0.16 0.4 1
p(x>3) = p(x=4) p(x=5) p(x=6)
0.3110 0.1866 0.0467
0.3110 + 0.1866 + 0.0467 = 0.5443
(iii) At most 2: P(x=0) + P(x=1) + P(x=2) = 0.1792
• The Poisson probability distribution is often used as a model of the number of arrivals at a
facility within a given period of time.
•
e = 2.71828.
If X is a Poisson random variable, the mean (μ) and the variance (σ) of X are: E(X) = V(X) = µ
•E(X) is the mean or expected value of X,
•V(X) is the variance of X and The standard deviation is always equal to the square root of the mean: σ =
√μ.
•λ is the mean rate of successes per period
•t is the length of time (or space) considered
THE PROPERTIES POISSON PROCESS: The Poisson process has the following properties:
• 1. The number of successes of various intervals are independent. A Poisson process has no memory.
• 2. The probability that a success will occur in an interval is the same for all intervals of equal size and
is proportional to the size of the interval. The mean process rate λmust remain constant for the entire
time span or space considered.
Here are some random variables that might
follow a Poisson distribution:
1. The number of orders a firm receives in a day.
• On a particular river, overflow floods occur once every 100 years on average. Calculate the
probability of x = 0, 1, 2, 3, 4, 5, or 6 overflow floods in a 100-year interval, assuming the
Poisson model is appropriate.
• Because the average event rate is one overflow flood per 100 years, μ = 1
• P ( x = 0 overflow floods in 100 years ) = 0.368
• P ( x = 1 overflow flood in 100 years ) = 0.368 x P(x overflow floods in 100 years)
• P ( x = 2 overflow floods in 100 years ) = 0.184
0 0.368
1 0.368
2 0.184
3 0.061
4 0.015
5 0.003
6 0.0005
EXERCISE
• a) Suppose that the mean number of calls arriving in a 15-
minute period is 10. To compute the probability that 5 calls
come in within the next 15 minutes , μ = 10 and x = 5 are
substituted in equation 7, giving a probability of 0.0378.
• A study of the queues at the checkout registers of A&C Supermarket
revealed that during a certain period at the rush hour, the average
number of customers waiting is four. What is the probability that
during that period:
• No customers were waiting.
• Four customers were waiting.
• Four or lower were waiting.
• Four or more were waiting.
EXERCISE
• Example 3:The marketing manager of a company has noted that
she usually receives 15 complaint calls from customers during a
week (consisting of 5 working days) and that the calls occur at
random.
• Find the probability of her receiving exactly 5 calls in a single day.
• Find the probability of her receiving at most 2 calls in a single
day.
• Example 4: A sports journal reports that the average number
of goals in a World Cup soccer match is approximately 2.5
and the Poisson model is appropriate.[5] Because the
average event rate is 2.5 goals per match, µ = 2.5.
• Find P(x = 0,1,2,3,4,5,6,7)
x P(x goals in a World Cup soccer match)
0 0.082
1 0.205
2 0.257
3 0.213
4 0.133
5 0.067
6 0.028
7 0.010
The Normal Distribution
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Example: Percent of Population Between 0 and 0.45
Start at the row for 0.4, and read under 0.05 to get 0.45: there is the
value 0.1736
And 0.1736 is 17.36%
So 17.36% of the population are between 0 and 0.45 Standard
Deviations from the Mean.
Example: Percent of Population Z Between -1 and 2
From −1 to 0 is the same as from 0 to +1:
At the row for 1.0, first column 1.00, there is the value 0.3413
From 0 to +2 is:
At the row for 2.0, first column 2.00, there is the value 0.4772
Add the two to get the total between -1 and 2:
0.3413 + 0.4772 = 0.8185
And 0.8185 is 81.85%
So 81.85% of the population are between -1 and +2 Standard Deviations from the Mean.
EXAMPLES
• Use the Standard Normal Distribution table to find P(-2.31 < Z ≤ 0.82). = 0.4896 +
0.2939 =
• Use the Standard Normal Distribution table to find P(Z ≤ 2)= 0.5+0.4772=0.9477
• Use the Standard Normal Distribution table to find P(-1.65 < Z ≤ 1.93)
• Use the Standard Normal Distribution table to find P(0.85 < Z ≤ 2.23)
• Use the Standard Normal Distribution table to find P(Z > 1.75)= 0.5-0.4599=0.0401
• Use the Standard Normal Distribution table to find P(Z ≤ -0.69)
• = 0.5-0.2549=0.2451
• mean
Example 1: A normal distribution of retail‐store purchases has a mean of ghc 14.31
and a standard deviation of 6.40. What percentage of purchases were under ghc
10.00?
First, compute the z‐score:
X 14.31
• Approximately 10 percent of the purchases were below ghc 6.12.
EXERCISE:
The ages of the population of a town are Normally distributed with mean 43
and standard deviation 14. How many would you expect to be aged
between 22 and 57?
SOLUTION
So, the percentage of the population aged between 22 and 57
X1= 22, x2= 57,µ=43 ,σ=14,
P(22≤ Z ≤57)
Z1= (x-µ)/ σ = 22-43/14 = -1.5
Reading: 0.4332
Z2= 57-43/14= 1.0
Reading: 0.3413
Required area= Z1+Z2= 0.3413+0.4332=0.7745 2243 57
P(z>57) = 0.5-0.3413= 0.1587
STANDARD NORMAL AND SAMPLING DISTRIBUTIONS QUESTION
• Sampling distribution:
The sampling distribution of a statistic is a probability distribution for all possible
values of the statistic computed from a sample of size n.
Sampling distribution of the sample mean:
The sampling distribution of the sample mean is the probability distribution of all
possible values of the random variable computed from a sample of size n from a
population with mean, µ and standard deviation, σ .
• How Does it Work?
• Step 1: obtain a simple random sample of size n.
• Step 2: compute the sample mean .
• Step 3: assuming that we are sampling from a finite population, repeat steps 1 and 2 until all
distinct simple random samples of n have been obtained.
• Note: once a particular sample is obtained, it cannot be obtained a second time.
• Plot the frequency distribution of each sample statistic that you developed from the step
above. The resulting graph will be the sampling distribution.
• Types of Sampling Distribution
• 1. Sampling distribution of mean
• As shown from the example above, you can calculate the mean of every sample group chosen from the
population and plot out all the data points. The graph will show a normal distribution, and the center will be
the mean of the sampling distribution, which is the mean of the entire population.
• 2. Sampling distribution of proportion
• It gives you information about proportions in a population. You would select samples from the population and
get the sample proportion. The mean of all the sample proportions that you calculate from each sample group
would become the proportion of the entire population.
• 3. T-distribution
• T-distribution is used when the sample size is very small or not much is known about the population. It is used
The Central Limit Theorem
• The most fundamental point and interval estimation process involves the estimation of a
population mean. Suppose it is of interest to estimate the population mean, μ, for a
quantitative variable. Data collected from a simple random sample can be used to
compute the sample mean, x̄, where the value of x̄ provides a point estimate of μ.
• When the sample mean is used as a point estimate of the population mean, some error
can be expected owing to the fact that a sample, or subset of the population, is used to
compute the point estimate. The absolute value of the difference between the sample
mean, x̄, and the population mean, μ, written |x̄ − μ|, is called the sampling error.
• Interval estimation incorporates a probability statement about the magnitude of the
sampling error. The sampling distribution of x̄ provides the basis for such a statement.
• Statisticians have shown that the mean of the sampling distribution of x̄ is equal to the
population mean, μ, and that the standard deviation is given by σ/√n, where σ is the
population standard deviation. The standard deviation of a sampling distribution is called
the standard error.
• For large sample sizes, the central limit theorem indicates that the sampling distribution
of x̄ can be approximated by a normal probability distribution. As a matter of practice,
statisticians usually consider samples of size 30 or more to be large.
Example
Let X, be the mean of a random sample of size 50 drawn from a population with mean 112 and
standard deviation 40.
1) Find the mean and standard deviation of X.
2) Find the probability that X assumes a value between 110 and 114.
3) Find the probability that x assumes a value greater than 113.
Solution:
1) By the properties of sampling distribution of means,
μX = μ =112 and σX = σ/√n = 40/√ 50 = 5.65685
2) P(110<X<114)=P((110−μX)/σX<Z<(114−μX)/σX)
=P((110−112)/5.65685<Z<(114−112)/5.65685)
=P(−0.35<Z<0.35)= 0.1368 + 0.1368 = 0.2736
or 0.6368−0.3632 = 0.2736 ie reading under an area of 1.
3) P(X>113)=P(Z>(113−μX)/σX) = P(Z>(113−112)/5.65685)
=P(Z>0.18) = 1−P(Z<0.18) = 0.5 – 0.0714 = 0.4286
or 1−0.5714 = 0.4286
110 112 114
112 113
Exercise
• The numerical population of grade point averages at a college has mean 2.61
and standard deviation 0.5. If a random sample of size 100 is taken from the
population, what is the probability that the sample mean will be between 2.51
and 2.71?
• Answer: 0.9455
Exercise
• The numerical population of grade point averages at a college has mean 2.61
and standard deviation 0.5. If a random sample of size 100 is taken from the
population, what is the probability that the sample mean will be between 2.51
and 2.71?
• Answer: 0.9455
Applications:
1) Suppose the mean number of days to germination of a variety of seed is 22, with standard deviation
2.3 days. Find the probability that the mean germination time of a sample of 160 seeds will be within
0.5 day of the population mean.
2) Suppose the mean length of time that a caller is placed on hold when telephoning a customer service
center is 23.8 seconds, with standard deviation 4.6 seconds. Find the probability that the mean length of
time on hold in a sample of 1,200 calls will be within 0.5 second of the population mean.
3) Suppose the mean amount of cholesterol in eggs labeled “large” is 186 milligrams, with standard
deviation 7 milligrams. Find the probability that the mean amount of cholesterol in a sample of 144 eggs
will be within 2 milligrams of the population mean.
4) Suppose that in one region of the country the mean amount of credit card debt per household in
households having credit card debt is $15,250, with standard deviation $7,125. Find the probability that
the mean amount of credit card debt in a sample of 1,600 such households will be within $300 of the
population mean.
5) Suppose speeds of vehicles on a particular stretch of roadway are normally distributed with mean 36.6
mph and standard deviation 1.7 mph.
a) Find the probability that the speed X of a randomly selected vehicle is between 35 and 40 mph.
b) Find the probability that the mean speed of 20 randomly selected vehicles is between 35 and 40 mph.
The Sample Proportion
Often sampling is done in order to estimate the proportion of a population that has a
specific characteristic, such as the proportion of all items coming off an assembly line
that are defective or the proportion of all people entering a retail store who make a
purchase before leaving. The population proportion is denoted p and the sample
proportion is denoted pˆ. Thus if in reality 43% of people entering a store make a purchase
before leaving, p = 0.43; if in a sample of 200 people entering the store, 78 make a
purchase, pˆ=78/200=0.39.
The sample proportion is a random variable: it varies from sample to sample in a way
that cannot be predicted with certainty. Viewed as a random variable it will be written Pˆ.
It has a mean, μPˆ and a standard deviation, σPˆ. Here are formulas for their values.
Suppose random samples of size n are drawn from a population in which the proportion
with a characteristic of interest is p. The mean, μPˆ and standard deviation, σPˆ of the
sample proportion, Pˆ
Satisfy: μPˆ=p and σPˆ=√pq/n, where
q=1−p.
Example
Suppose that in a population of voters in a certain region 38% are in favor of particular bond issue.
Nine hundred randomly selected voters are asked if they favor the bond issue.
1) Verify that the sample proportion Pˆ computed from samples of size 900 meets the condition that its sampling
distribution be approximately normal.
2) Find the probability that the sample proportion computed from a sample of size 900 will be within 5 percentage
points of the true population proportion.
Solution
1.The information given is that p = 0.38, hence q=1−p=0.62.
First we use the formulas to compute the mean and standard deviation of Pˆ:
μPˆ=p=0.38 and σPˆ=√(pq)/n = (0.38)(0.62)√900 =0.01618
Then 3σP ˆ= 3(0.01618) = 0.04854≈0.05
So [p−3σPˆ,p+3σPˆ]= [0.38−0.05,0.38+0.05] = [0.33,0.43]
which lies wholly within the interval [0,1] , so it is safe to assume that Pˆ is approximately normally distributed.
2) To be within 5 percentage points of the true population proportion 0.38 means to be between
0.38−0.05=0.33 and 0.38+0.05=0.43.
Thus,
P(0.33<Pˆ<0.43) = P((0.33−μPˆ)/σPˆ<Z<(0.43−μPˆ)/σPˆ)
=P((0.33−0.38)/0.01618<Z<(0.43−0.38)/0.01618)
=P(−3.09<Z<3.09)= 0.4990 + 0.4990 = 0.9980
P(3.09)−P(−3.09)= Or 0.9990−0.0010=0.9980
Estimation and Confidence Intervals
Using algebra, we can rework this inequality such that the mean (μ) is the
middle term, as shown below.
Table - Z-Scores for Commonly Used Confidence
Intervals
Desired Confidence
Z Score
Interval
1.645
90% 1.96
95% 2.576
99%
Example: Descriptive statistics on variables measured in a sample of a n=3,539
participants attending the 7th examination of the offspring in the Framingham
Heart Study are shown below:
Because the sample is large, we can generate a 95% confidence interval for
systolic blood pressure using the following formula:
Suppose we compute a 95% confidence interval for the true systolic blood pressure using data in the
subsample. Because the sample size is small, we must now use the confidence interval formula that involves t
rather than Z.
The sample size is n=10, the degrees of freedom (df) = n-1 = 9. The t value for 95% confidence
with df = 9 is t = 2.262.
T DISTRIBUTION TABLE
SOLUTION:
Substituting the sample statistics and the t value for 95% confidence, we have
the following expression:
Interpretation:
Based on this sample of size n=10, our best estimate of the true mean systolic
blood pressure in the population is 121.2.
Based on this sample, we are 95% confident that the true systolic blood
pressure in the population is between 113.3 and 129.1. Note that the margin of
error is larger here primarily due to the small sample size.
.
Confidence Interval for the Population Proportion
and
Example: During the 7th examination of the Offspring
cohort in the Framingham Heart Study there were 1,219
participants being treated for hypertension and 2,313 who
were not on treatment. If we call treatment a "success",
then x=1,219 and n=3,532.
• The sample proportion is:
• This is the point estimate, i.e., our best estimate of the proportion of the
population on treatment for hypertension is 34.5%. The sample is large,
so the confidence interval can be computed using the formula:
• Substituting our values we get