0% found this document useful (0 votes)
5 views

probability distribution-intro

Uploaded by

tum chris
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

probability distribution-intro

Uploaded by

tum chris
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

LECTURE 11

CHAPTER 7: RANDOM VARIABLES AND THEIR PROBABILITY


DISTRIBUTIONS
7.1 Introduction
Any variable can have a number of possible values. If the values occur
unpredictably, it may be
called a random variable. Each value (in case of a discrete variable) may
have a particular
chance or probability of occurring. The sum of the probabilities of all the
possible values is
always 1. Recall ‘discrete’ means ‘positive whole number’. The variable we
are dealing with can
assume a discrete value. Example: We may have zero, one, two........people
in a room. The
number would be limited by the size of the room. We cannot have negative
two people or one
and a half people.
Example : The number of people (X) in a moving car licensed to carry up to
6
people follows the pattern below: -
Number Probability
X P (X)
1 0.5
2 0.3
Purpose
To acquaint the student with knowledge of various probability
distributions and to use them
to solve various problems related to statistics.
Objectives
a) Define a random variable.
b) State and describe the features of the following distributions:
i. Binomial distribution
ii. Poisson distribution
iii. Normal distribution
c) Use tables to read probabilities for the above distributions.
155
3 0.1
4 0.05
5 0.03
6 0.02
1.0 Total probability
From this ‘distribution’ we can find any of the following probabilities:
Exactly 3 people: P (X = 3) =0.1
Up to 3 people: P (X ≤ 3) = 0.5 +0 .3 + 0.1 =0 .9
Fewer than 3 people: P (X < 3) = 0.5 + 0.3 = 0.8
More than 3 people: P (X > 3) = 0.05 + 0.03 + 0.02 =0.1
At least 3 people: P (X ≥ 3) = 0.1 + 0.05 + 0.03 + 0.02 =0 .2
Between 2 and 5 people: P (2 ≤ X ≤ 5)* = .3 + .1 + .05 + .03 = .48
Anything but 3 people: P (X not =3) = .5 + .3 + .05 + .05 + .03 + .02 = .9
* P (2 ≤ X ≤ 5) is read as X is greater than or equal to 2 but less than or equal
to 5.
7.2 The Binomial Distribution
The binomial distribution is used when there are exactly two mutually
exclusive outcomes of a
trial. These outcomes are appropriately labeled "success" and "failure". The
binomial distribution
is used to obtain the probability of observing x successes in N trials, with the
probability of
success on a single trial denoted by p. The binomial distribution assumes
that p is fixed for all
trials.
156
7.2.1 Features of a Binomial Distribution
The Binomial experiment consists of:
 A fixed number of trials, n
 Two possible outcomes, p and q (or 1 – p). p is called ‘success’ and 1 – p
(the
complement) is called ‘failure’. ‘Success’ is what we are interested in. For
example, the proportion of defective items produced by a factory in a day
may
be 1%. p = 0.01.
 Independent trials (the outcome of one trial does not affect the outcomes
of any other
trials).
7.2.2 Using the Binomial Tables:
Different probabilities can be read from the binomial tables.
Each page focuses on a given value of ‘n’. Across the top are various values
of ‘p’. In
the left margin are the possible values of ‘k’. The body of the table gives P(X
≤ k).
Example 1
Find the probability of at least two defective items in a batch of 10 items
with a defective rate of
10%.
Here, we would need to find the following:-
P(X ≥ 2) = P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5) + P(X = 6) + P(X = 7) +
P(X = 8) + P(X = 9) + P(X = 10)
If we used P(X ≤ k) we would only need to find:
P(X ≥ 2) = 1 – P(X ≤ 1)
Your binomial tables are at the back of this Module.
157
Example 2:
A coin is tossed 10 times. What is the probability that there will be not more
than 4 ‘heads’?
Solution: n =10, p=.5 and we require P(X ≤ 4)
Directly from the tables using k=4 we get P(X ≤ 4) = 0.377
Example 3: 15 children are born. What is the probability that there will be
more than
9 girls?
Solution: n =15, p =.5 and we require P(X > 9)
P(X > 9) = P(X ≥ 10) = 1 - P(X ≤ 9)
= 1 - 0.849
= 0.151 (this may be written as 0.151 or .151).
Example 3: The chance of any computer chip being defective is 20%. If 15
chips are
selected at random, what is the probability that
a) Fewer than 3 will be defective?
b) None will be defective?
c) Exactly 5 will be defective?
d) Between 6 and 8 will be defective?
e) At least 10 will not be defective?
Solutions: In all cases n = 15, p = 0.2
Reading from the Binomial tables,
a) P(X < 3) = P(X ≤ 2)
= 0.398
b) P(X = 0) = P(X ≤ 0)
= 0.035
158
c) P(X = 5) = P(X ≤ 5) – P(X ≤ 4)
= 0.939 - 0.836
= 0.103
d) P(6 ≤ X ≤ 8) = P(X ≤ 8) – P(X ≤ 5)
= 0.999- 0 939
= 0.06
e) ‘At least 10 not defective’ = ‘at most 5 defective’
P(X ≤ 5) = 0.939
Alternatively, we could have used n=15, p= 0.8 (not defective)*
P(X ≥ 10) = 1 - P(X ≤ 9)
= 1 - 0.061
= 0.939 which is the same result.
Note: *The designations for success and failure can be interchanged
So far we have shown how we determine the probabilities of a binomial
experiment
by using the Binomial Tables. What if the number of trials, n, or the
probability, p,
cannot be found in the tables? In this situation we can use formula or Excel
(insert –
function. Select ‘statistical’ and ‘binomdist’). However, you are not required
to
use formula in this course. You will use the tables. For your information only
the
formula is:
  px  pn x 
x
n
P x p n   


 

 , ,  1 for 0,1,2…,n
159
where
=
X represents the name of the random variable. x represents the value of the
random
variable. The factorial sign (!) is best explained by example. 6! means 6 x 5 x
4x3x
2 x 1 = 720. You will find the ! button on your calculator.
7.3 The Poisson Distribution
A particular event occurs ‘on average’ μ times within a certain period or
situation.
What is the probability that it will occur k times in that period?
A Poisson probability distribution is the number of occurrences per interval of
time or
space.
Time? Example: A programmer makes, on average, 2 mistakes per hour.
Space?
Example: A programmer makes, on average, 2 mistakes per program.
Example : The average number of arrivals at a doctor’s clinic is 3 per hour.
What is
the probability that in a given hour there will be 5 arrivals? Here we have
μ=3 and
k=5.
Note:
1) There is no maximum possible value for ‘k’.
7.3.1 Features of the Poisson Experiment
 The number of successes that occur in a period of time or an interval of
space is
independent of the number of successes that occur in any other interval
160
 The probability of a success in an interval is the same for all equal-sized
intervals and
proportional to the size of the interval (for example an average of 4 in an
hour is
equivalent to an average of 2 in 30 minutes (1/2 hour) and 8 in two hours).
Example: The possible number of accidents on a given stretch of road is
infinite (only restricted by time).
2) The stated mean applies only to the given specific situation or time
period. If the time period
or the size of the situation changes, the mean must be adjusted accordingly.
Example: If 4 trucks cross a bridge in 20 minutes (on average), then 2
trucks will cross the
bridge in a 10 minute period (on average).
7.3.2 Using the Poisson Tables:
Across the top are various values of ‘μ’. In the left margin are the possible
values of
‘k’. The body of the table gives P(X ≤ k).
Eg. If μ = 3 and k = 5 then P(X ≤ 5) = .916.
Note: This is not the probability of ‘exactly 5’ arrivals, but ‘up to 5’ arrivals.
Illustration : Poisson Tables for μ values are at the back of this module.
Example 1
The average number of faults in each TV set is 3. What is the probability that
a TV set chosen at
random will have a) not more than 4 faults? b) more than 6 faults? c)
between 3 and 5 faults? d)
exactly 2 faults?
Solutions : For all cases μ = 3.
a) P(X ≤ 4) = 0.815
161
b) P(X > 6) = P(X ≥ 7) = 1 – P(X ≤ 6)
= 1 - 0.966
= 0.034
c) P(3 ≤ X ≤ 5) = P(X ≤ 5) – P(X ≤ 2)
= 0.916 - 0.423
= 0.493
d) P(X = 2) = P(X ≤ 2) – P(X ≤ 1)
= 0.423 - 0.199
= 0.244
Example 2
The average number of telephone calls you receive in your office every
hour is 3.
a) You must leave the office on urgent business that is expected to take
half an hour. What is the probability that you will not miss any calls during
that time?
b) In fact the ‘urgent business’ takes two hours. What is the probability that
you have
missed at least 5 calls?
Solutions: Average number of calls per hour is 3.
a) μ = 1.5 (for half an hour)
P(X = 0) = P(X ≤ 0) = P(X = 0) because there are no values < 0
= 0.223
b) μ = 6.0 (for two hours)
P(X ≥ 5) = 1 – P(X ≤ 4)
= 1 - 0.285
= 0.715
162
Example 3
The average number of fire-crews engaged in fighting fires in a large
city at any one time is 2.5. How many fire-crews should be available on duty
so that
they can respond to at least 95% of all fire emergencies?
Solution :
Given μ = 2.5, we need to find a value of k such that P(X ≤ k) >.95
K P(X ≤ k)
3 .758
4 .891
5 .958
Five crews are needed to cover at least 95% of emergencies.
This latter problem as a ‘poisson-reverse’ question. Normally the question
asks
for a probability and gives the x or k value. Here the probability is given and the
question
requires k.
So far we have shown how we determine the probabilities of a Poisson
experiment
By using the Poisson Tables. What if the value of the population mean (μ)
cannot be
found in the tables? In this situation we can use formula or Excel (insert –
function)
Select ‘statistical’ and ‘Poisson’). However, you are not required to use
formula in
this course. You will use the tables. For your information only the formula
is:
X=
e is the base of the natural logarithm (approximately 2.71828).
163
7.3.3 Conditions under which a Poisson distribution holds
 counts of rare events
 all events are independent
 average rate does not change over the period of interest
7.3.4 Examples of experiments where a Poisson distribution holds:
 birth defects
 number of sample defects on a car
 number of typographical errors on a page
 Examples of Poisson probability distribution:
 The mass of alpha particles released by a radioactive source in a known
interval of time.
 The number of phone calls approved at a telephone exchange in a known
interval of time.
 The amount of imperfect research paper in a packet of 100, created by a
good industry.
 The amount of printing errors at every page of a book by a good
publication.
 The number of road accidents reports in a city at a particular junction at a
particular tim
7.3.5 Examples of experiments where a Poisson distribution may not
hold
 number of insects on a tree - contagion?
 number of males in families of size 4 - not `rare' events
164
LECTURE 12
7.4 The Normal Distribution
Normal distribution is probably one of the most important and widely used
continuous
distribution. It is known as a normal random variable, and its probability
distribution is called a
normal distribution. The following are the characteristics of the normal
distribution:
7.4.1 Features of the Normal Distribution:
1. It is bell shaped and is symmetrical about its mean.
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction
from the mean.
3. It is a continuous distribution.
4. It is a family of curves, i.e., every unique pair of mean and standard
deviation defines a
different normal distribution. Thus, the normal distribution is completely
described by two
parameters: mean and standard deviation. See the following figure.
5. Total area under the curve sums to 1, i.e., the area of the distribution on
each side of the mean
is 0.5.
6. It is unimodal, i.e., values mound up only in the center of the curve.
7. The probability that a random variable will have a value between any two
points is equal to
the area under the curve between those points.
7.5 The Standard Normal Distribution
Note that the integral calculus is used to find the area under the normal
distribution curve.
However, this can be avoided by transforming all normal distribution to fit
the standard normal
distribution. This conversion is done by rescaling the normal distribution axis
from its true units
(time, weight, dollars, and...) to a standard measure called Z score or Z
value. A Z score is the
number of standard deviations that a value, X, is away from the mean. If the
value of X is greater
than the mean, the Z score is positive; if the value of X is less than the
mean, the Z score is
negative. The Z score or equation is as follows:
165
Z = (X - Mean) /Standard deviation
A standard Z table can be used to find probabilities for any normal curve
problem that has been
converted to Z scores. The Z distribution is a normal distribution with a mean
of 0 and a standard
deviation of 1.
 Probability Function given by
Standard Normal Non-Standard Normal
Mean = 0
and
Variance = 1
Mean is not 0
or
Variance is not 1
The following steps are helpful when working with the normal curve
problems:
1. Graph the normal distribution, and shade the area related to the
probability you want to find.
2. Convert the boundaries of the shaded area from X values to the standard
normal random
variable Z values using the Z formula above.
3. Use the standard Z table to find the probabilities or the areas related to
the Z values in step 2.
Example 1
Graduate Management Aptitude Test (GMAT) scores are widely used by
graduate schools of
business as an entrance requirement. Suppose that in one particular year,
the mean score for the
GMAT was 476, with a standard deviation of 107. Assuming that the GMAT
scores are normally
distributed, answer the following questions:
Question 1. What is the probability that a randomly selected score from this
GMAT falls
between 476 and 650? <= x <="650)" the following figure shows a graphic
representation of this
problem.
166
Figure 4
Applying the Z equation, we get: Z = (650 - 476)/107 = 1.62. The Z value of
1.62 indicates that
the GMAT score of 650 is 1.62 standard deviation above the mean. The
standard normal table
gives the probability of value falling between 650 and the mean. The whole
number and tenths
place portion of the Z score appear in the first column of the table. Across
the top of the table are
the values of the hundredths place portion of the Z score. Thus the answer is
that 0.4474 or
44.74% of the scores on the GMAT fall between a score of 650 and 476.
Question 2.
What is the probability of receiving a score greater than 750 on a GMAT test
that has a mean of
476 and a standard deviation of 107? i.e., P(X >= 750) =?
This problem is asking for determining the area of the upper tail of the
distribution. The Z score
is: Z = ( 750 - 476)/107 = 2.56. From the table, the probability for this Z
score is 0.4948. This is
the probability of a GMAT with a score between 476 and 750. The rule is that
when we want to
find the probability in either tail, we must subtract the table value from 0.50.
Thus, the answer to
this problem is: 0.5 - 0.4948 = 0.0052 or 0.52%. Note that P(X >= 750) is
the same as P(X
>750), because, in continuous distribution, the area under an exact number
such as X=750 is
zero. The following figure shows a graphic representation of this problem.
167
Figure 5
Question 3. What is the probability of receiving a score of 540 or less on a
GMAT test that has a
mean of 476 and a standard deviation of 107? i.e., P(X <= 540)="?." we are
asked to determine
the area under the curve for all values less than or equal to 540. the z score
is: z="(540"
476)/107="0.6." from the table, the probability for this z score is 0.2257
which is the probability
of getting a score between the mean (476) and 540. The rule is that when we
want to find the
probability between two values of x on either side of the mean, we just add the two
areas
together. Thus, the answer to this problem is: 0.5 + 0.2257 = 0.73 or 73%. The following figure
shows a graphic representation of this problem.
Figure 6
Question 4. What is the probability of receiving a score between 440 and
330 on a GMAT test
that has a mean of 476 and a standard deviation of 107? i.e., P(330 <
440)="?." the solution to
this problem involves determining the area of the shaded slice in the lower
half of the curve in
168
the following figure.
Figure 7
In this problem, the two values fall on the same side of the mean. The Z
scores are: Z1 = (330 -
476)/107 = -1.36, and Z2 = (440 - 476)/107 = -0.34. The probability
associated with Z = -1.36 is
0.4131, and the probability associated with Z = -0.34 is 0.1331. The rule is
that when we want to
find the probability between two values of X on one side of the mean, we just
subtract the
smaller area from the larger area to get the probability between the two
values. Thus, the answer
to this problem is: 0.4131 - 0.1331 = 0.28 or 28%.
Example 2:
Suppose that a tire factory wants to set a mileage guarantee on its new
model called LA 50 tire.
Life tests indicated that the mean mileage is 47,900, and standard deviation
of the normally
distributed distribution of mileage is 2,050 miles. The factory wants to set
the guaranteed
mileage so that no more than 5% of the tires will have to be replaced. What
guaranteed mileage
should the factory announce? i.e., P(X <= ?)="5%.<br"> In this problem, the
mean and standard
deviation are given, but X and Z are unknown. The problem is to solve for an
X value that has
5% or 0.05 of the X values less than that value. If 0.05 of the values are less
than X, then 0.45 lie
between X and the mean (0.5 - 0.05), see the following graph.
169
Refer to the standard normal distribution table and search the body of the
table for 0.45. Since
the exact number is not found in the table, search for the closest number to
0.45. There are two
values equidistant from 0.45-- 0.4505 and 0.4495. Move to the left from
these values, and read
the Z scores in the margin, which are: 1.65 and 1.64. Take the average of
these two Z scores, i.e.,
(1.65 + 1.64)/2 = 1.645. Plug this number and the values of the mean and
the standard deviation
into the Z equation, you get:
Z =(X - mean)/standard deviation or -1.645 =(X - 47,900)/2,050 = 44,528
miles.
Thus, the factory should set the guaranteed mileage at 44,528 miles if the
objective is not to
replace more than 5% of the tires.
170
LECTURE 13
7.6 Finding z‐scores from probabilities
This is more challenging, and requires you to use the table inversely. You
must look up the
area between zero and the value on the inside part of the table, and then
read the z-score
from the outside. Finally, decide if the z-score should be positive or negative,
based on
whether it was on the left side or the right side of the mean. Remember, z-
scores can be
negative, but areas or probabilities cannot be.
Situation Instructions
Area between 0 and a value Look up the area in the table
Make negative if on the left side
Area in one tail Subtract the area from 0.5000
Look up the difference in the table
Make negative if in the left tail
Area including one complete half
(Less than a positive or greater than a
negative)
Subtract 0.5000 from the area
Look up the difference in the table
Make negative if on the left side
Within z units of the mean Divide the area by 2
Look up the quotient in the table
Use both the positive and negative z-scores
Two tails with equal area
(More than z units from the mean)
Subtract the area from 1.000
Divide the area by 2
Look up the quotient in the table
Use both the positive and negative z-scores
Using the table becomes proficient with practice, work lots of the normal
probability problems!
171
-3 -2 -1 0 1 2 3 Z
The shaded area between the middle (Z =0) and any value of Z is given in
the
tables. Assume Z = 1.62
The area from a Z of 0 to a Z of 1.62 is 0.4474. (Read from the tables)
The standard normal probability tables are at the back of this topic.
Example : The heights of adult males is normally distributed with mean
170cm and
standard deviation 10cm.
172
Q1. Find the probability of a male between 180 and 190 cm.
Method:
a) Draw a diagram and shade the area required.
b) Use formula
Note: Strictly speaking a random variable is usually denoted by an
uppercase X
whereas a lower case x represents one of its values. You can show x
instead of X
if you wish.
To calculate Z values of the boundaries of shaded area
Find areas in tables
Z Area
1 .3413
2 .4772
d) Add or subtract table areas to get probability required.
P(180 ≤ x ≤ 190) = P(1 ≤ Z ≤ 2)
= 0.4772 - 0.3413
=0 .1359
Q2. Taller than 190cm
173
Tables : 0.4772
P(X>190) = 0.5 - .4772
= 0.0228
Q3. Shorter than 180cm
Z= 180- 170 =1
10
Tables: .3413
P(X<180) =0 .5 +0 .3413
= .8413
Q4. Shorter than 165cm.
Tables :0 .1915
174
Z =165 -170 = -0.5
10
P(X<165) = 0.5 –
0.1915
= 0.308
Chapter Review Questions
1. List 5 characteristics of the normal curve.
2. With the aid of clearly labeled normal curves, find the normal
curve between
i) Z=-2 and Z=2.6
ii) To the right of Z=2.4
iii)To the left of z=2.0
iv) Between Z=1.3 and Z=-1.2
3. The marks obtained in a Business statistics CAT are normally
distributed with mean 23 and
standard deviation 4.2. Find the probability that a randomly
selected student scores
i) More than 25 marks
ii) Between 20 and25 marks
175
iii)Less than the mean mark?
4. The probability of any dry cell being defective is 15%. If 10 dry
cells are
selected at random, what is the probability that
a) Fewer than 3 will be defective?
b) None will be defective?
c) Exactly 5 will be defective?
d) Between 6 and 8 will be defective?
e) At least 10 will not be defective?
5. The average number of cars passing through a certain junction
per minute is 10. What is
the probability that in one minute,
i) No fewer than 5 cars pass through the junction
ii) Exactly 12 cars pass through the junction
iii) Between 5 and 7 cars pass through the junction?
Reference
i. Essentials of statistics for Business and Economics by Anderson
Sweety Williams Pg
158-181, Pg191-212
ii. Quantitative Methods for Business by Donald Waters Pg382-393
176

You might also like