0% found this document useful (0 votes)
33 views65 pages

Prob Dist Updated-wps

Uploaded by

edward asiedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views65 pages

Prob Dist Updated-wps

Uploaded by

edward asiedu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

DEPARTMENT OF PHARMACEUTICAL SCIENCES

D. PHARM PROGRAMME

PHSC: 256 ( STATISTICAL METHODS)

TOPIC : RANDOM VARIABLE PROBABILITY DISTRIBUTIONS


LECTURER : REV. JOSEPH ODURO-YEBOAH
REV ODURO-YEBOAH 1
CIVIL ENGINEERING DEPARTMENT

CENG: 336 ( STATISTICAL METHODS)

TOPIC : RANDOM VARIABLE PROBABILITY DISTRIBUTIONS


LECTURER : REV. JOSEPH ODURO-YEBOAH
REV ODURO-YEBOAH 2
Probability Distribution
• The probability distribution for a random variable describes how the
probabilities are distributed over the values of the random variable.
For a discrete random variable, x, the probability distribution is
defined by a probability mass function, denoted by f(x).
• This function provides the probability for each value of the random
variable.
• In the development of the probability function for a discrete
random variable, two conditions must be satisfied:
(1) P(x) must be nonnegative for each value of the random variable,
0≤P(x)≤ 1, and
(2) the sum of the probabilities for each value of the random variable
must equal one, ie ∑f(x) = ∑P(x) =1
Probability Density Function
• In the continuous case, the counterpart of the probability mass
function is the probability density function, also denoted by f(x).
• However, the area under the graph of f(x) corresponding to some
interval, obtained by computing the integral of f(x) over that
interval, provides the probability that the variable will take on a
value within that interval.
• A probability density function must satisfy two requirements:
(1) f(x) must be nonnegative for each value of the random variable,
and
(2) the integral over all values of the random variable must be equal
to one.
Random Variables and Probability Distributions

• A random variable is a variable whose value is unknown or a function


that assigns numerical values to each of an experiment's outcomes.
• A random variable can be either discrete (having specific values) or
continuous (any value in a continuous range).
• The use of random variables is most common in probability and statistics,
where they are used to quantify outcomes of random occurrences.
• Risk analysts use random variables to estimate the probability of an
adverse event occurring.
• For instance, a random variable representing the number of automobiles
sold at a particular dealership in a day would be discrete, while a random
variable representing the weight of a person in kilograms would be
continuous.
Probability Distribution of a Random Variable
A random variable can be described as a variable that can take on the possible
values of an outcome of an experiment. There can be two types of random
variables, namely, discrete and continuous random variables. Given below are
the various formulas for the probability distribution of a random variable.

Probability Distribution of a Discrete Random Variable


A discrete random variable can be defined as a variable that can take a
countable distinct value like 0, 1, 2, 3... The formulas for the probability
distribution function and the probability mass function for a discrete random
variable are given below:
Probability Distribution Function: F(x) = P (X ≤ x)
Probability Mass Function: p(x) = P(X = x)
Discrete and Continuous Random Variables:

• A variable is a quantity whose value changes.


• A discrete variable is a variable whose value is obtained by counting.
• Examples: number of hostels on campus, number of red marbles in
a jar, number of heads when flipping three coins, students’ grade
level.
• A continuous variable is a variable whose value is obtained by
measuring.
• Examples: height of students in class
• weight of students in class
• time it takes to get to school
• distance travelled between classes
mportant Notes on Probability Distribution

A probability distribution is used to describe all the possible values of a random


variable and their corresponding occurrence probabilities.

There can be two types of probability distributions. These are the continuous
probability distribution (e.g., Normal distribution) and the discrete probability
distribution (e.g., Bernoulli distribution).

A probability distribution function and a probability density function (pdf) can be


used to describe the characteristics of a continuous distribution

A discrete distribution can be defined by a probability mass function (pmf) and


probability distribution function.
Statistical Measurements for the random variable x


QUESTION

2. a) Differentiate between discrete random variable and continuous random variable?


b) What are the properties of a discrete probability distribution?
c) Let the random variable X represents the number of automobiles that are sold by an automobile dealer.
The sales data over the past year is giving by;
x 0 1 2 3 4
ƒ(x) 0.2 0.25 0.3 0.15 0
(i) What is the probability that at least 2 cars will be sold in a day?
(ii) Calculate the expected value of x.
(iii) Calculate the standard deviation of x and determine the interval one standard deviation about the mea
d) A committee was formed to study the problem of parking spaces at Central University at Miotso Campus f
Monday to Friday. Research was done to study the students do attend lectures within a week. The informati
was obtained for each day.
x 1 2 3 4 5
ƒ(x) 0.1 0.150.35 0.2 0.2

i) What is the probability that a student attends lectures at least 3 days in a week?
ii) Calculate the expected value of x.
iii) Calculate the standard deviation of x and determine the interval one standard deviation about the mean.
Solutions
2) (a) Discrete random variables are counted whilst continuous random variables are measured.
(b) (i) 0 ≤ p(x) ≤ 1
(ii) ∑p(x)i = 1
c) xi Pi =f(x) x*p X^2 X^2 *p
0 0.20 0.00 0.00 0.00
1 0.25 0.25 1.00 0.25
2 0.30 0.60 4.00 1.20
3 0.15 0.45 9.00 1.35
4 0.10 0.40 16.00 1.60
1.00 Σxp(x)=1.70 4.40
i) p( at least 2 cars sold) = 0.55
(ii) µ = Σxp(x) = 1.70
(iii) Var(x) = σ2 = 4.40 - 2.89
Var(x) = σ2 = 1.51
σ = 1.22882 = 1.229
(iv) (µ - σ or µ+ σ)= = ( 0.47, 2.93 )
d) xi pi x*p X^2 X^2 *p
1 0.10 0.10 1.00 0.10
2 0.15 0.30 4.00 0.60
3 0.35 1.05 9.00 3.15
4 0.20 0.80 16.00 3.20
5 0.20 1.00 25.00 5.00
1.00 3.25 12.05
i) p( at least 3 days in a week) = 0.75
(ii) µ = 3.25
(iii) 12.05 - 10.5625
1.49
σ = 1.22066
σ = 1.22066

(iv) µ - σ or µ+ σ
( 2.03, 4.47 )
QUESTION ON BUILDING PROBABILITY FREQ.
DIST. TABLE
• A newly married couple plans to have three children. They are curious
about whether their children will be boys or girls.
a) Illustrate the expectations of the couple on a tree diagram and from
the expected outcomes, build probability frequency distribution table for
the random variable X, the child being a boy.
(b) Calculate the mean and standard deviation of the distribution.
(c) Determine the interval two standard deviation about the mean and
illustrate the interval on a probability frequency distribution graph.
d)What is the probability that the couple will have two boys?
e)What is the probability that the couple will have two boys and a girl?
f)What is the probability that the couple will have two girls?
Tree diagrams
1st child 2nd child 3rd child Experimental outcomes
B …………………… BBB 1/8

• B G ……………………. BBG 1/8


• B …………………. BGB 1/8
• 1/2 B G G …………………. BGG 1/8
• B …………………. GBB 1/8

• B
• 1/2 G G ……………………. GBG 1/8
• G B …………………… GGB 1/8
• G ……………………. GGG 1/8
SOLUTION

X, child is boy P(X) X*P(x) (x)


0 1/8 0 0
1 3/8 3/8 3/8
2 3/8 6/8 12/8
3 1/8 3/8 9/8
TOTAL ∑P(x) = 1 ∑X*P(x)=12/8=1.5 ∑(x)=24/8=3

• P(2boys)= 3/8
Special probability distributions
The Binomial Distribution
Two of the most widely used discrete probability distributions are the binomial and Poisson.
The binomial probability mass function B(n,x,p,q) ==

ncombinationx = n!/((n-x)!x!)
(equation 6) provides the probability that x successes will occur in n trials of a binomial
experiment.

A binomial experiment has four properties:


(1) it consists of a sequence of n identical trials;
(2) two outcomes, success or failure, are possible on each trial;
(3) the probability of success on any trial, denoted p, does not change from
trial to trial; and
(4) the trials are independent.
For the Binomial, average = np and standard deviation = √ npq
Question

3) A Bernoulli trial can result in a success with probability p and a failure, q = 1-p. Then the pr
distribution of the binomial random variable X, the number of success in n
Independent trials is b(x;n,p) = pxqn-x , x = 0,1,2,…,n.

a) State the properties of the Bernoulli process.


b) The Institute of Environmental Sciences looks at the performance of instruments des
measure, report and record apparent sizes of particles on a surface. They report that in n measu
taken by a device, the number of times a particle is detected has a binomial distribution. Sup
probability of a device detecting particle is p = 0.6

i) If 6 readings are taken, find the probability that the particle is detected more than 3 times

ii) If 6 readings are taken, find the probability that the particle is detected less than 3 times.
iii) If 6 readings are taken, what is the expected number of particles detected?
iv) If 6 readings are taken, what is the standard deviation of the number of particles detected
Answer
3(a)(i) i) There are n repeated trials.
ii) they are independent.
iii) Each of the trials have two outcomes ie success and failure.
iv) A successful outcome remain constant from trial to trial.
a (ii) n= 6, p= 0.6, q= 0.4 15 6 1
For x > 3, p(x>3) = (6 ) * (0.6)^4 *(0.4)^2 0.1296 0.078 0.047
4) 0.16 0.4 1
p(x>3) = p(x=4) p(x=5) p(x=6)
0.3110 0.1866 0.0467
0.3110 + 0.1866 + 0.0467 = 0.5443
(iii) At most 2: P(x=0) + P(x=1) + P(x=2) = 0.1792

IV) expected no. of persons testing positive = mean= np = 6 x 0.6 =3.6


standard deviation = √ npq
=√6x0.6x0.4 = 1.20
EXERCISE

a) Suppose that it is known that 10 percent of the owners of two-year old


automobiles have had problems with their automobile’s electrical system,
compute the probability of finding exactly 2 owners that have had electrical system
problems out of a group of 10 owners, use the binomial probability mass function by
setting, n = 10, x = 2, and p = 0.1, q = 0.9 for this case,
the probability is 0.1937.
b) A recent study by the MTTU of the Ghana Police Service revealed that about 40%
of Ghanaians drivers use seat belts. A sample of 10 drivers on Ghanaian roads is
selected:
• What is the probability that exactly 3 drivers were wearing seat belts?
• What is the probability that less than 3 drivers were wearing seat belts?
• What is the probability that at most 3 drivers were wearing seat belts?
The Poisson Distribution

• The Poisson probability distribution is often used as a model of the number of arrivals at a
facility within a given period of time.


e = 2.71828.
If X is a Poisson random variable, the mean (μ) and the variance (σ) of X are: E(X) = V(X) = µ
•E(X) is the mean or expected value of X,
•V(X) is the variance of X and The standard deviation is always equal to the square root of the mean: σ =
√μ.
•λ is the mean rate of successes per period
•t is the length of time (or space) considered
THE PROPERTIES POISSON PROCESS: The Poisson process has the following properties:
• 1. The number of successes of various intervals are independent. A Poisson process has no memory.
• 2. The probability that a success will occur in an interval is the same for all intervals of equal size and
is proportional to the size of the interval. The mean process rate λmust remain constant for the entire
time span or space considered.
Here are some random variables that might
follow a Poisson distribution:
1. The number of orders a firm receives in a day.

2. The number of people who apply for a job in a day or month to


a human resources division.

3. The number of defects in a finished product.

4. The number of calls a firm receives in a week for help


concerning an “easy-to-assemble” toy.
Examples of probability for Poisson distributions

• On a particular river, overflow floods occur once every 100 years on average. Calculate the
probability of x = 0, 1, 2, 3, 4, 5, or 6 overflow floods in a 100-year interval, assuming the
Poisson model is appropriate.
• Because the average event rate is one overflow flood per 100 years, μ = 1
• P ( x = 0 overflow floods in 100 years ) = 0.368
• P ( x = 1 overflow flood in 100 years ) = 0.368 x P(x overflow floods in 100 years)
• P ( x = 2 overflow floods in 100 years ) = 0.184
0 0.368
1 0.368
2 0.184
3 0.061
4 0.015
5 0.003
6 0.0005
EXERCISE
• a) Suppose that the mean number of calls arriving in a 15-
minute period is 10. To compute the probability that 5 calls
come in within the next 15 minutes , μ = 10 and x = 5 are
substituted in equation 7, giving a probability of 0.0378.
• A study of the queues at the checkout registers of A&C Supermarket
revealed that during a certain period at the rush hour, the average
number of customers waiting is four. What is the probability that
during that period:
• No customers were waiting.
• Four customers were waiting.
• Four or lower were waiting.
• Four or more were waiting.
EXERCISE
• Example 3:The marketing manager of a company has noted that
she usually receives 15 complaint calls from customers during a
week (consisting of 5 working days) and that the calls occur at
random.
• Find the probability of her receiving exactly 5 calls in a single day.
• Find the probability of her receiving at most 2 calls in a single
day.
• Example 4: A sports journal reports that the average number
of goals in a World Cup soccer match is approximately 2.5
and the Poisson model is appropriate.[5] Because the
average event rate is 2.5 goals per match, µ = 2.5.
• Find P(x = 0,1,2,3,4,5,6,7)
x P(x goals in a World Cup soccer match)

0 0.082

1 0.205

2 0.257

3 0.213

4 0.133

5 0.067

6 0.028

7 0.010
The Normal Distribution

• The most widely used continuous probability distribution


in statistics is the normal probability distribution.
• Properties of a normal distribution
• The mean, mode and median are all equal.
• The curve is symmetric at the centre (i.e. around the mean, μ).
• Exactly half of the values are to the left of centre and exactly
half the values are to the right.
• The total area under the curve is 1.
• Like all normal distribution graphs, it is a bell-shaped
curve.
When the area of the standard normal curve is divided
into sections by standard deviations above and below
the mean, the area in each section is a known quantity.
As explained earlier, the area in each section is the
same as the probability of randomly drawing a value in
that range.
Figure 1. The normal curve and the area under
the curve between σ units.
Figure 2.The normal curve and the area under the
curve between σ units.
To convert a value to a Standard Score ("z-score"):
first subtract the mean, then divide by the Standard
Deviation And doing that is called "Standardizing":
In order to use the area of the normal curve to determine the
probability of occurrence of a given value, the value must first
be standardized , or converted to a z‐score . To convert a
value to a z‐score is to express it in terms of how many
standard deviations it is above or below the mean. After the
z‐score is obtained, you can look up its corresponding
probability in a table. The formula to compute a z‐score is

• where x is the value to be converted, μ is the population


mean, and σ is the population standard deviation.
Probabilities for the normal probability distribution can be
computed using statistical tables for the standard normal
probability distribution, which is a normal probability distribution
with a mean of zero and a standard deviation of one.
A simple mathematical formula is used to convert any value
from a normal probability distribution with mean μ and a
standard deviation σ into a corresponding value for a standard
normal distribution.
The tabl
es for the standard normal distribution are then used to comput
e the appropriate probabilities.
Normal Distribution Table

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Example: Percent of Population Between 0 and 0.45
Start at the row for 0.4, and read under 0.05 to get 0.45: there is the
value 0.1736
And 0.1736 is 17.36%
So 17.36% of the population are between 0 and 0.45 Standard
Deviations from the Mean.
Example: Percent of Population Z Between -1 and 2
From −1 to 0 is the same as from 0 to +1:
At the row for 1.0, first column 1.00, there is the value 0.3413
From 0 to +2 is:
At the row for 2.0, first column 2.00, there is the value 0.4772
Add the two to get the total between -1 and 2:
0.3413 + 0.4772 = 0.8185
And 0.8185 is 81.85%
So 81.85% of the population are between -1 and +2 Standard Deviations from the Mean.
EXAMPLES
• Use the Standard Normal Distribution table to find P(-2.31 < Z ≤ 0.82). = 0.4896 +
0.2939 =
• Use the Standard Normal Distribution table to find P(Z ≤ 2)= 0.5+0.4772=0.9477
• Use the Standard Normal Distribution table to find P(-1.65 < Z ≤ 1.93)
• Use the Standard Normal Distribution table to find P(0.85 < Z ≤ 2.23)
• Use the Standard Normal Distribution table to find P(Z > 1.75)= 0.5-0.4599=0.0401
• Use the Standard Normal Distribution table to find P(Z ≤ -0.69)
• = 0.5-0.2549=0.2451
• mean
Example 1: A normal distribution of retail‐store purchases has a mean of ghc 14.31
and a standard deviation of 6.40. What percentage of purchases were under ghc
10.00?
First, compute the z‐score:

the reqd. area= 0.5-0.2486=0.2514


Purchases above ghc10.00 = 0.2486+0.5= 0.7486
The next step is to look up the z‐score in the table of standard normal probabilities
in "Statistics Tables". The standard normal table lists the probabilities (curve areas)
associated with given z‐scores. Table 2 in "Statistics Tables" gives the area of the
curve below z—in other words, the probability of obtaining a value of z or lower. Not
all standard normal tables use the same format, however. Some list only positive z‐
scores and give the area of the curve between the mean and z.
To use Table 2 (the table of standard normal probabilities) in "Statistics Tables,"
first look up the z‐score in the left column, which lists z to the first decimal place.
Then look along the top row for the second decimal place. The intersection of the
row and column is the probability. In the example, you first find –0.6 in the left
column and then 0.07 in the top row. Their intersection is 0.2514. The answer,
then, is that about 25 percent of the purchases were under ghc10 (see Figure 3).
What if you had wanted to know the percentage of purchases above a certain
amount? Because Table gives the area of the curve the curve above a z of –0.67
is 0.5 + 0.2514 = 0.7486. Approximately 75 percent of the purchases were above
ghc10.
Just as Table can be used to obtain probabilities from z‐scores, it can be used to
do the reverse.
Finding a probability using a z‐score on the normal curve.
below a given z, to obtain the area of the curve above z, simply subtract the
tabled probability from 1. The area of
Example 2 (HOW TO ESTIMATE X)
A normal distribution of retail‐store purchases has a mean of ghc 14.31 and a
standard deviation of 6.40, what purchase amount marks the lower 10 percent
of the distribution?
Locate in Table the probability of 0.1000, or as close as you can find, and
read off the corresponding z‐score. The figure that you seek lies between
the tabled probabilities of 0.0985 and 0.1003, but closer to 0.1003, which
corresponds to a z‐score of –1.28. Now, use the z formula, this time
solving for x:

0.1 0.4 0.5

X 14.31
• Approximately 10 percent of the purchases were below ghc 6.12.
EXERCISE:
The ages of the population of a town are Normally distributed with mean 43
and standard deviation 14. How many would you expect to be aged
between 22 and 57?
SOLUTION
So, the percentage of the population aged between 22 and 57
X1= 22, x2= 57,µ=43 ,σ=14,
P(22≤ Z ≤57)
Z1= (x-µ)/ σ = 22-43/14 = -1.5
Reading: 0.4332
Z2= 57-43/14= 1.0
Reading: 0.3413
Required area= Z1+Z2= 0.3413+0.4332=0.7745 2243 57
P(z>57) = 0.5-0.3413= 0.1587
STANDARD NORMAL AND SAMPLING DISTRIBUTIONS QUESTION

• (a)(i) State the properties of the Standard Normal Distribution.


• (ii) State the properties of the Sampling Distribution of Means of all samples of
size n.
• ( b ) The annual precipitation amounts for Ghana appear to be normally
distributed with a mean of 32.473inches and a standard deviation 5.601 inches
(based on data from the Ghana Meteorological Department).
• If 1 year is randomly selected, what is the chance that the annual precipitation
is less than 29.000 inches.
• If a decade of ten years is randomly selected, what is the chance that the
annual precipitation amounts have a mean less than 29.000 inches.
• Given that part (ii) involves a sample size that is not larger than or equal to 30,
why can the central limit theorem be used.
What is a Sampling Distribution?

• Sampling distribution:
The sampling distribution of a statistic is a probability distribution for all possible
values of the statistic computed from a sample of size n.
Sampling distribution of the sample mean:
The sampling distribution of the sample mean is the probability distribution of all
possible values of the random variable computed from a sample of size n from a
population with mean, µ and standard deviation, σ .
• How Does it Work?
• Step 1: obtain a simple random sample of size n.
• Step 2: compute the sample mean .
• Step 3: assuming that we are sampling from a finite population, repeat steps 1 and 2 until all
distinct simple random samples of n have been obtained.
• Note: once a particular sample is obtained, it cannot be obtained a second time.
• Plot the frequency distribution of each sample statistic that you developed from the step
above. The resulting graph will be the sampling distribution.
• Types of Sampling Distribution
• 1. Sampling distribution of mean
• As shown from the example above, you can calculate the mean of every sample group chosen from the
population and plot out all the data points. The graph will show a normal distribution, and the center will be
the mean of the sampling distribution, which is the mean of the entire population.
• 2. Sampling distribution of proportion
• It gives you information about proportions in a population. You would select samples from the population and
get the sample proportion. The mean of all the sample proportions that you calculate from each sample group
would become the proportion of the entire population.
• 3. T-distribution
• T-distribution is used when the sample size is very small or not much is known about the population. It is used
The Central Limit Theorem

1) For samples of size 30 or more, the sample


mean is approximately normally distributed, with
2) mean μX=μ and
3) standard deviation σX = σ/√n, where n is the
sample size.
The larger the sample size, the better the
approximation.
Estimation of a Population Mean and Proportions

• The most fundamental point and interval estimation process involves the estimation of a
population mean. Suppose it is of interest to estimate the population mean, μ, for a
quantitative variable. Data collected from a simple random sample can be used to
compute the sample mean, x̄, where the value of x̄ provides a point estimate of μ.
• When the sample mean is used as a point estimate of the population mean, some error
can be expected owing to the fact that a sample, or subset of the population, is used to
compute the point estimate. The absolute value of the difference between the sample
mean, x̄, and the population mean, μ, written |x̄ − μ|, is called the sampling error.
• Interval estimation incorporates a probability statement about the magnitude of the
sampling error. The sampling distribution of x̄ provides the basis for such a statement.
• Statisticians have shown that the mean of the sampling distribution of x̄ is equal to the
population mean, μ, and that the standard deviation is given by σ/√n, where σ is the
population standard deviation. The standard deviation of a sampling distribution is called
the standard error.
• For large sample sizes, the central limit theorem indicates that the sampling distribution
of x̄ can be approximated by a normal probability distribution. As a matter of practice,
statisticians usually consider samples of size 30 or more to be large.
Example

Let X, be the mean of a random sample of size 50 drawn from a population with mean 112 and
standard deviation 40.
1) Find the mean and standard deviation of X.
2) Find the probability that X assumes a value between 110 and 114.
3) Find the probability that x assumes a value greater than 113.
Solution:
1) By the properties of sampling distribution of means,
μX = μ =112 and σX = σ/√n = 40/√ 50 = 5.65685
2) P(110<X<114)=P((110−μX)/σX<Z<(114−μX)/σX)
=P((110−112)/5.65685<Z<(114−112)/5.65685)
=P(−0.35<Z<0.35)= 0.1368 + 0.1368 = 0.2736
or 0.6368−0.3632 = 0.2736 ie reading under an area of 1.

3) P(X>113)=P(Z>(113−μX)/σX) = P(Z>(113−112)/5.65685)
=P(Z>0.18) = 1−P(Z<0.18) = 0.5 – 0.0714 = 0.4286
or 1−0.5714 = 0.4286
110 112 114
112 113
Exercise
• The numerical population of grade point averages at a college has mean 2.61
and standard deviation 0.5. If a random sample of size 100 is taken from the
population, what is the probability that the sample mean will be between 2.51
and 2.71?
• Answer: 0.9455
Exercise
• The numerical population of grade point averages at a college has mean 2.61
and standard deviation 0.5. If a random sample of size 100 is taken from the
population, what is the probability that the sample mean will be between 2.51
and 2.71?
• Answer: 0.9455
Applications:

1) Suppose the mean number of days to germination of a variety of seed is 22, with standard deviation
2.3 days. Find the probability that the mean germination time of a sample of 160 seeds will be within
0.5 day of the population mean.
2) Suppose the mean length of time that a caller is placed on hold when telephoning a customer service
center is 23.8 seconds, with standard deviation 4.6 seconds. Find the probability that the mean length of
time on hold in a sample of 1,200 calls will be within 0.5 second of the population mean.

3) Suppose the mean amount of cholesterol in eggs labeled “large” is 186 milligrams, with standard
deviation 7 milligrams. Find the probability that the mean amount of cholesterol in a sample of 144 eggs
will be within 2 milligrams of the population mean.

4) Suppose that in one region of the country the mean amount of credit card debt per household in
households having credit card debt is $15,250, with standard deviation $7,125. Find the probability that
the mean amount of credit card debt in a sample of 1,600 such households will be within $300 of the
population mean.

5) Suppose speeds of vehicles on a particular stretch of roadway are normally distributed with mean 36.6
mph and standard deviation 1.7 mph.
a) Find the probability that the speed X of a randomly selected vehicle is between 35 and 40 mph.
b) Find the probability that the mean speed of 20 randomly selected vehicles is between 35 and 40 mph.
The Sample Proportion

Often sampling is done in order to estimate the proportion of a population that has a
specific characteristic, such as the proportion of all items coming off an assembly line
that are defective or the proportion of all people entering a retail store who make a
purchase before leaving. The population proportion is denoted p and the sample
proportion is denoted pˆ. Thus if in reality 43% of people entering a store make a purchase
before leaving, p = 0.43; if in a sample of 200 people entering the store, 78 make a
purchase, pˆ=78/200=0.39.

The sample proportion is a random variable: it varies from sample to sample in a way
that cannot be predicted with certainty. Viewed as a random variable it will be written Pˆ.
It has a mean, μPˆ and a standard deviation, σPˆ. Here are formulas for their values.
Suppose random samples of size n are drawn from a population in which the proportion
with a characteristic of interest is p. The mean, μPˆ and standard deviation, σPˆ of the
sample proportion, Pˆ
Satisfy: μPˆ=p and σPˆ=√pq/n, where
q=1−p.
Example
Suppose that in a population of voters in a certain region 38% are in favor of particular bond issue.
Nine hundred randomly selected voters are asked if they favor the bond issue.
1) Verify that the sample proportion Pˆ computed from samples of size 900 meets the condition that its sampling
distribution be approximately normal.
2) Find the probability that the sample proportion computed from a sample of size 900 will be within 5 percentage
points of the true population proportion.

Solution
1.The information given is that p = 0.38, hence q=1−p=0.62.
First we use the formulas to compute the mean and standard deviation of Pˆ:
μPˆ=p=0.38 and σPˆ=√(pq)/n = (0.38)(0.62)√900 =0.01618
Then 3σP ˆ= 3(0.01618) = 0.04854≈0.05
So [p−3σPˆ,p+3σPˆ]= [0.38−0.05,0.38+0.05] = [0.33,0.43]
which lies wholly within the interval [0,1] , so it is safe to assume that Pˆ is approximately normally distributed.
2) To be within 5 percentage points of the true population proportion 0.38 means to be between
0.38−0.05=0.33 and 0.38+0.05=0.43.
Thus,
P(0.33<Pˆ<0.43) = P((0.33−μPˆ)/σPˆ<Z<(0.43−μPˆ)/σPˆ)
=P((0.33−0.38)/0.01618<Z<(0.43−0.38)/0.01618)
=P(−3.09<Z<3.09)= 0.4990 + 0.4990 = 0.9980
P(3.09)−P(−3.09)= Or 0.9990−0.0010=0.9980
Estimation and Confidence Intervals

Suppose we want to generate a 95% confidence interval estimate for an


unknown population mean. This means that there is a 95% probability that
the confidence interval will contain the true population mean.
Thus, P( [sample mean] - margin of error < μ < [sample mean] + margin of
error) = 0.95.
The Central Limit Theorem introduced in the module on Probability stated
that, for large samples, the distribution of the sample means is
approximately normally distributed with
1) a mean:

2) a standard deviation (also called the standard error):


• In the large-sample case, a 95% confidence interval estimate for the population mean
is given by x̄ ± 1.96σ/√n. When the population standard deviation, σ, is unknown,
the sample standard deviation is used to estimate σ in the confidence interval
formula. The quantity 1.96σ/√n is often called the margin of error for the estimate.
• The quantity σ/√n is the standard error, and 1.96 is the number of standard errors
from the mean necessary to include 95% of the values in a normal distribution.
• The interpretation of a 95% confidence interval is that 95% of the intervals
constructed in this manner will contain the population mean. Thus, any interval
computed in this manner has a 95% confidence of containing the population mean.
• By changing the constant from 1.96 to 1.645, a 90% confidence interval can be
obtained. It should be noted from the formula for an interval estimate that a 90%
confidence interval is narrower than a 95% confidence interval and as such has a
slightly smaller confidence of including the population mean.
• Lower levels of confidence lead to even more narrow intervals. In practice, a 95%
confidence interval is the most widely used.
• For the standard normal distribution, P(-1.96 < Z < 1.96) = 0.95, i.e.,
there is a 95% probability that a standard normal variable, Z, will fall
between -1.96 and 1.96. The Central Limit Theorem states that for large
samples:
• By substituting the expression on the right side of the equation:

Using algebra, we can rework this inequality such that the mean (μ) is the
middle term, as shown below.
Table - Z-Scores for Commonly Used Confidence
Intervals

Desired Confidence
Z Score
Interval
1.645
90% 1.96
95% 2.576
99%
Example: Descriptive statistics on variables measured in a sample of a n=3,539
participants attending the 7th examination of the offspring in the Framingham
Heart Study are shown below:
Because the sample is large, we can generate a 95% confidence interval for
systolic blood pressure using the following formula:

Characteristic n Sample Mean Standard Deviation (s)


Systolic Blood Pressure 3,534 127.3 19.0

Diastolic Blood Pressure 3,532 74.0 9.9

Total Serum Cholesterol 3,310 200.3 36.8

Weight 3,506 174.4 38.7


Height 3,326 65.957 3.749
Body Mass Index 3,326 28.15 5.32
SOLUTION:
Substituting the sample statistics and the Z value for 95% confidence, we have

So the confidence interval is (126.7,127.9)


• A point estimate for the true mean systolic blood pressure in the population is
127.3, and we are 95% confident that the true mean is between 126.7 and
127.9. The margin of error is very small here because of the large sample size.
• EXERCISE:
1) What is the 90% confidence interval for BMI? (Note that Z=1.645 to reflect the
90% confidence level.)
The table below shows data on a subsample of n=10 participants
in the 7th examination of the Framingham Offspring Study.
Characteristic n Sample Mean Standard Deviation (s)
Systolic Blood Pressure 10 121.2 11.1
Diastolic Blood Pressure 10 71.3 7.2
Total Serum Cholesterol 10 202.3 37.7
Weight 10 176.0 33.0
Height 10 67.175 4.205
Body Mass Index 10 27.26 3.10

Suppose we compute a 95% confidence interval for the true systolic blood pressure using data in the
subsample. Because the sample size is small, we must now use the confidence interval formula that involves t
rather than Z.
The sample size is n=10, the degrees of freedom (df) = n-1 = 9. The t value for 95% confidence
with df = 9 is t = 2.262.
T DISTRIBUTION TABLE
SOLUTION:

Substituting the sample statistics and the t value for 95% confidence, we have
the following expression:

Interpretation:
Based on this sample of size n=10, our best estimate of the true mean systolic
blood pressure in the population is 121.2.
Based on this sample, we are 95% confident that the true systolic blood
pressure in the population is between 113.3 and 129.1. Note that the margin of
error is larger here primarily due to the small sample size.

.
Confidence Interval for the Population Proportion

• For example, if we wish to estimate the proportion of people with diabetes in a


population, we consider a diagnosis of diabetes as a "success" (i.e., and individual
who has the outcome of interest), and we consider lack of diagnosis of diabetes
as a "failure." In this example, X represents the number of people with a diagnosis
of diabetes in the sample. The sample proportion is p̂ (called "p-hat"), and it is
computed by taking the ratio of the number of successes in the sample to the
sample size, that is: p̂ = x/n
If there are more than 5 successes and more than 5 failures, then the confidence
interval can be computed with this formula:
The point estimate for the population proportion is the sample proportion, and the
margin of error is the product of the Z value for the desired confidence level (e.g.,
Z=1.96 for 95% confidence) and the standard error of the point estimate.
In other words, the standard error of the point estimate is:

and
Example: During the 7th examination of the Offspring
cohort in the Framingham Heart Study there were 1,219
participants being treated for hypertension and 2,313 who
were not on treatment. If we call treatment a "success",
then x=1,219 and n=3,532.
• The sample proportion is:

• This is the point estimate, i.e., our best estimate of the proportion of the
population on treatment for hypertension is 34.5%. The sample is large,
so the confidence interval can be computed using the formula:
• Substituting our values we get

• So, the 95% confidence interval is (0.329, 0.361).


• Thus we are 95% confident that the true proportion of persons on
antihypertensive medication is between 32.9% and 36.1%.
• Specific applications of estimation for a single population with a
dichotomous outcome involve estimating prevalence, cumulative
incidence, and incidence rates.
t-distribution table.
df α = 0.1 0.05 0.025 0.01 0.005 0.001 0.0005
∞ tα=1.282 1.645 1.960 2.326 2.576 3.091 3.291
1 3.078 6.314 12.706 31.821 63.656 318.289 636.578
2 1.886 2.920 4.303 6.965 9.925 22.328 31.600
3 1.638 2.353 3.182 4.541 5.841 10.214 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.894 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041

You might also like