0% found this document useful (0 votes)
5 views

Chapter 6 and 7

The document discusses various probability distributions, including the Binomial and Poisson distributions, detailing their characteristics, formulas, and examples. It also covers continuous probability distributions, particularly the normal distribution, and explains sampling methods and techniques. Key concepts include the importance of sample size, representative sampling, and the differences between probability and non-probability sampling methods.

Uploaded by

kertinabekele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Chapter 6 and 7

The document discusses various probability distributions, including the Binomial and Poisson distributions, detailing their characteristics, formulas, and examples. It also covers continuous probability distributions, particularly the normal distribution, and explains sampling methods and techniques. Key concepts include the importance of sample size, representative sampling, and the differences between probability and non-probability sampling methods.

Uploaded by

kertinabekele
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

The Binomial probability distribution

 Suppose a random experiment has the following characteristics.


 There are n identical and independent trials of a common procedure.

 There are exactly two possible outcomes for each trial, one termed “success”

and the other “failure.”

 The probability of success on any one trial is the same number p.

 Then the discrete random variable X that counts the number of successes
in the n trials is the binomial random variable with parameters n and
p. We also say that X has a binomial distribution with parameters n
and p.

1
The Binomial probability distribution
 If X is binomial, then

2
The Binomial probability distribution
 Example: A corporation has advertised heavily to try to insure that over half the
adult population recognizes the brand name of its products. In a random sample
of 20 adults, 14 recognized its brand name.
a) What is the probability that 14 or more people in such a sample would recognize its

brand name if the actual proportion p of all adults who recognize the brand name
were only 0.50?

b) What is the mean number of adults who recognized the brand name

c) What is the variance number of adults who recognized the brand name

3
The Binomial probability distribution
 Example: A corporation has …
Let X be the number pf adults who recognized brand name
P=0.5
n=20→x~BINOM(20,0.5)
a) What is the probability that 14 or more people in such a sample would recognize its
brand name if the actual proportion p of all adults who recognize the brand name
were only 0.50?
P(X≥14)=0.0577
b) What is the mean number of adults who recognized the brand name

Mean=E(X)=np=10
c) What is the variance number of adults who recognized the brand name

Var(X)=np(1-p)=5

4
The Poisson probability distribution
 The Poisson probability distribution provides a good model for the
probability distribution of the number of “rare events” that occur
randomly in time, distance, or space.

 Assume that an interval is divided into a very large number of


subintervals so that the probability of the occurrence of an event in
any subinterval is very small.

5
The Poisson probability distribution
 Assumptions of a Poisson probability distribution:

 The probability of an occurrence of an event is constant for all

subintervals: independent events;

 You are counting the number times a particular event occurs in a unit;

and

 As the unit gets smaller, the probability that two or more events will

occur in that unit approaches zero.

6
The Poisson probability distribution
 The random variable X is said to follow the Poisson probability
distribution if it has the probability function:
e −  x
P( x) = , for x = 0, 1,2,...
 where x!
• P(x) = the probability of x successes over a given period of time or space, given 

•  = the expected number of successes per time or space unit;  > 0

• e = 2.71828 (the base for natural logarithms)

• The mean and variance of the Poisson probability distribution are:

 x = E ( X ) =  and  x2 = E[( X −  ) 2 ] = 

7
The Poisson probability distribution
 Example: If calls to your cell phone are a Poisson process with a
constant rate =2 calls per hour,

 what’s the probability that, if you forget to turn your phone off in a

1.5 hour movie, your phone rings during that time?

( 2 * 1.5) 0 e −2 (1.5) (3) 0 e −3


P ( X = 0) = = e −3 = .05
0! 0!
 How many phone calls do you expect to get during the movie?

E(X) = t = 2(1.5) = 3

8
The Poisson probability distribution
 Example: A life insurance company insures the lives of 5,000 men
of age 42. If actuarial studies show the probability of any 42-year-
old man dying in a given year to be 0.001.
a) What is the probability that a company will pay 4 claims per year

b) What is the mean number of claims per year the company will pay.

c) What is the probability that a company will pay at least 1 claims per
year

9
The Poisson probability distribution
 Example: A life insurance company….

n=5000, p=0.001, Mean=np→ binomial, →λ=5→X~Poisson(4)


a) What is the probability that a company will pay 4 claims per year
𝑒 −λ λ−𝑥 𝑒 −5 5−4
𝑃 𝑋=4 = = 𝑋=4 = =0.17547
𝑥! 4!

b) What is the mean number of claims per year the company will pay.
Mean=5

c) What is the probability that a company will pay at least 1 claims per
year

𝑃 𝑋 ≥1 =𝑃 𝑋 =1 +𝑃 𝑋 =2 +⋯=1−𝑃 𝑋 =0
𝑒 −5 5−0
=1- 0! =0.993262 10
Continuous Probability Distribution
 A continuous random variable is a variable that can assume any value in
an interval

 thickness of an item

 time required to complete a task

 temperature of a solution

 height, in meters

11
Continuous Probability Distribution
 A continuous random variable has an infinite number of possible
values that can be represented by an interval on the number line.

Hours spent studying in a day

0 3 6 9 12 15 18 21 24

The time spent studying


can be any number
between 0 and 24.

The probability distribution of a continuous random variable is called a continuous


probability distribution.

12
Normal Probability Distribution
 The most important probability distribution in statistics is the normal
distribution. f(x)
 Bell Shaped’

 Symmetrical
σ
 Mean, Median and Mode are Equal
μ x
 Location is determined by the mean, μ

 Spread is determined by the standard deviation, σ

 The random variable has an infinite theoretical range: +  to − 

 The total area under the curve is equal to one.

13
Normal Probability Distribution
 The normal distribution closely approximates the probability distributions of a
wide range of random variables

 Distributions of sample means approach a normal distribution given a “large”


sample size

 Computations of probabilities are direct and elegant

 The normal probability distribution has led to good business decisions for a
number of applications

 The formula for the normal probability density function is

14
Normal Probability Distribution
 For a normal random variable X with mean μ and variance σ2 , i.e., X~N(μ, σ2),
the cumulative distribution function is

F(x0 ) = P(X  x 0 )

f(x)

P(X  x0 )

0 x0 x 15
Normal Probability Distribution
 There may be thousands of normal distribution curves, each with a different
mean and a different standard deviation.

 Since the shapes are different, the areas under the curves between any two points
are also different.

 To make life easier, all normal distributions can be converted to a standard


normal distribution.

 A standard normal distribution has a mean of 0 and a standard deviation of 1.

16
Standard Normal Probability Distribution
 The letter z is used to designate the standard normal random variable.

=1

z
0
• Converting to the Standard Normal Distribution requires the use of this formula

z= Value - Mean
=
x -μ.
Standard deviation σ

• If X is distributed normally with mean of 100 and standard deviation of 50, the Z
value for X = 200 is 2.0

17
Standard Normal Probability Distribution

 a −μ b −μ
P(a  X  b) = P Z 
 σ σ 
f(x)  b −μ  a −μ
= F  − F z 
 σ   σ 

x
a µ b
a −μ b −μ
Z
σ 0 σ

18
Standard Normal Probability Distribution
 Properties of the Standard Normal Distribution
1. The cumulative area is close to 0 for z-scores close to z = −3.49.

2. The cumulative area increases as the z-scores increase.

3. The cumulative area for z = 0 is 0.5000.

4. The cumulative area is close to 1 for z-scores close to z = 3.49

Area is close to 0. Area is close to 1.

z = −3.49 z = 3.49
z=0
Area is 0.5000.

z=0

19
Standard Normal Probability Distribution
 Example: Find the area that corresponds to a z-score of between 0 and 2.71.

 Find the area by finding 2.7 in the left hand column, and then moving across the row to the column under 0.01

 The area to between z=0 and z = 2.71 is 0.4966.


20
Standard Probability Distribution

21
Standard Normal Probability Distribution

 Example: A personal computer is used for office work at home, research,


communication, personal finances, education, entertainment, social networking,
and a myriad of other things. Suppose that the average number of hours a
household personal computer is used for entertainment is two hours per day.
Assume the times for entertainment are normally distributed and the standard
deviation for the times is half an hour.

a. Find the probability that a household personal computer is used for


entertainment between 1.8 and 2.75 hours per day.

b. Find the maximum number of hours per day that the bottom quartile of
households uses a personal computer for entertainment.

22
Standard Normal Probability Distribution

Example: The service life of a certain brand of automobile battery is normally


distributed with a mean of 1000 days and a standard deviation of 100 days. The
manufacturer of the battery wants to offer a guarantee, but does not know the
length of the warranty. It does not want to replace more than 10 percent of the
batteries sold. What should be the length of the warranty?

23
Chapter 7
Sampling and sampling distribution

Sampling
Most researchers come to a conclusion of their study by
studying a small sample from the huge population or
universe.
To draw conclusions about population from sample, there
are two major requirements for a sample:
1 the sample size should be adequately large.
2 the sample has to be selected appropriately so that it will
be a representative of the population.
Sampling techniques is concerned with the selection of
representative sample, especially for the purposes of
statistical inference.
83 / 144
Some definitions:
Target population (reference population): is the
population about which an investigator wishes to draw a
conclusion.
Sampled population (population sampled): a population
from which the actual sample was drawn and about which a
conclusion can be made.
Sampling unit: the ultimate unit to be sampled or
elements of the population to be sampled.
Sampling frame: is the list of all elements in a population.
Sampling errors: are errors arising due to drawing
inferences about the population on the basis of few
observations. Thus, it is the discrepancy between the
population value and sample value.
It involved in the collection, processing and analysis of a
data.
It may arise due to inappropriate sampling techniques.

84 / 144
Non Sampling errors: are errors that arise at the stages
of observations, compilation and analysis of data.
It can happen in both sample surveys as well as complete
population enumeration survey. Thus, the sample survey
would be subject to both the sampling errors as well as
non-sampling errors.
Non sampling errors occur at every stage of planning and
execution because of faulty planning, errors in response by
the respondents, compilation errors etc.
Reasons for Sampling:
1 Reduced cost; Greater speed; Greater accuracy

2 Greater scope

3 Avoids destructive test

Sometimes taking a census makes more sense than


using a sample. Some of the reasons include:
Universality; Qualitativeness
Detailedness; Non-representativeness
85 / 144
Methods of sampling/Sampling techniques
Sampling can be classified into two categories, namely,
probability sampling and non-probability sampling.
Probability sampling: is a method of sampling in which
all elements in the population have a pre-assigned non zero
probability to be included in to the sample. That is,
sampling units are selected on the basis of chance.
Non probability sampling: is a sampling technique in
which the choice of individuals for a sample depends on the
basis of convenience, personal choice or interest.
The most common examples of probability sampling
include Simple random sampling, stratified random
sampling, cluster sampling,systematic sampling and
multistage sampling. However, Judgment sampling,
Convenience sampling and Quota Sampling are some
examples for non probability sampling.
86 / 144
Probability Sampling
Simple random sampling(SRS)
In simple random sampling, each unit in the population has
equal chance or probability to be selected in the sample.
There are two types: SRS with replacement and SRS
without replacement.
In SRS with replacement, the selected unit is replaced back
to the population and again has the chance of getting
selected.Whereas in SRS without replacement, which is the
usual method in medical research, the selected unit is not
put back in the population and hence the population size
reduces by one at each selection.
Random samples can be drawn by lottery method or by
using random number tables(Reading Assignment).
It is applied when the population is homogeneous.

87 / 144
Stratified random sampling
It is preferred when the population is heterogeneous with
respect to characteristic under study.
In this method, the complete population is divided into
homogenous sub groups called ”Strata” and then a stratified
sample is obtained by independently selecting a separate
simple random sample from each population stratum.
Some of the criteria for dividing a population into strata
are: Sex (male, female); Age (under 18, 18 to 28, 29 to 39);
Occupation (blue-collar, professional, other).
Random samples taken within a stratum will have much
less variability than a random sample taken across all
strata. This is true because sample units within each
stratum tend to have characteristics that are similar.

88 / 144
Systematic Random Sampling
Systematic sampling is a commonly employed technique,
when complete and up to date list of sampling units is
available.
A systematic random sample is obtained by selecting one
unit on a random basis and then choosing additional units
at evenly spaced intervals until the desired number of
sample size is obtained.
Let N=population size; n=sample size and k = N/n is
sampling interval.
Then choose randomly a number between 1 and k. Suppose
the randomly chosen number is j (1 ≤ j ≤ k).
The j th unit is selected at first and then (j + k)th ,
(j + 2k)th , (j + 3k)th ...,etc until the required sample size is
reached.

89 / 144
Cluster sampling
It is obtained by selecting clusters from the population on
the basis of simple random sampling so that each and every
units in the selected clusters will be included in the sample.
Clusters are formed by grouping units on the basis of their
geographical locations.Thus, elements within a cluster are
heterogeneous.
The advantage of cluster sampling is that sampling frame is
not required and in practice when complete lists are rarely
available, cluster sampling is suitable.
Multistage Sampling
In this method, the whole population is divided in first stage
sampling units from which a random sample is selected.
The selected first stage is then subdivided into second stage
units from which another sample is selected. Third and
fourth stage sampling is done in the same manner if
necessary. For example, in an urban survey in a state, a
sample of towns may be taken first and then in each of the
selected towns, a second stage sample of households may be
taken. 90 / 144
Sampling distribution
The distribution of all possible values that can be assumed
by some statistic, computed from samples of the same size
randomly drawn from the same population, is called the
sampling distribution of that statistic.
Sampling distributions may be constructed empirically
when sampling from a discrete, finite population.
To construct a sampling distribution we proceed as follows:

From a finite population of size N, randomly draw all


possible samples of size n.
Compute the statistic of interest for each sample.
List in one column the different distinct observed values of
the statistic, and in another column list the corresponding
frequency/probability of occurrence of each distinct
observed value of the statistic.

91 / 144
There are commonly three properties of interest for a given
sampling distribution.
Its Mean
Its Variance
Its Functional form.
Example: Suppose we have a population of size N = 5,
consisting of the age of five children: 6, 8, 10, 12, and 14.
PN
i=1 Xi (6 + 8 + 10 + 12 + 14)
P opulation mean = µ = = = 10
N 5
PN
(X −µ)2
P opulation variance = σ 2 = i=1 N i =8
Take samples of size 2 with replacement and construct
sampling distribution of the sample mean.

92 / 144
Solution: Since the sampling is with replacement, there is
N n = 52 possible ways of getting a sample of size 2. Thus, the
possible samples and the corresponding sample means are
presented in open and closed braces respectively as follows.
6 8 10 12 14
6 (6,6) [6] (6,8)[7] (6,10)[8] (6,12)[9] (6,14[10])
8 (8,6)[7] (8,8)[8] (8,10)[9] (8,12)[10] (8,14)[11]
10 (10,6)[8] (10,8)[9] (10,10)[10] (10,12)[11] (10,14)[12]
12 (12,6)[9] (12,8)[10] (12,10)[11] (12,12)[12] (12,14)[13]
14 (14,6)[10] (14,8)[11] (14,10)[12] (14,12)[13] (14,14)[14]

Therefore, the sampling distribution of the mean will be


constructed by listing the different values in one column and
their probability/frequency of occurrence like as follows.

Sample mean(X̄) 6 7 8 9 10 11 12 13 14
P(X̄ = x̄) 1/25 2/25 3/25 4/25 5/25 4/25 3/25 2/25 1/25

93 / 144
We are usually interested in the functional form of a
sampling distribution, its mean, and its variance. To
illustrate these characteristics, lets we again consider the
sampling distribution of the sample mean(X̄).
Mean:
X
E(X̄) = P (X̄i = x̄i )X̄i = 1/25∗6+2/25∗7+...+1/25∗14 = 10

=> µX̄ = E(X̄) = µ


Variance:
X
V ar(X̄) = E(X̄−µ)2 = P (X̄i = x̄i )(X̄i −µ)2 = 1/25∗(6−10)2 +

2/25 ∗ (7 − 10)2 + ... + 1/25 ∗ (14 − 10)2 = 4 = 8/2


=> V ar(X̄) = σ 2 /n
Functional form: the distribution of the sample mean
plotted as a histogram, along with the distribution of the
population, both of which are shown as follows.
94 / 144
From the plots we can observe that the parent population is
uniformly distributed, while the sampling distribution of the
mean gradually rises to a peak and then drops off with perfect
symmetry.

95 / 144
Remarks:
In any case (i.e, sampling with and without replacement),
the sample mean is unbiased estimator of the population
mean.
=> E(X̄) = µ
For sampling with replacement: V ar(X̄) = σ 2 /n
σ2 N −n
For sampling without replacement: V ar(X̄) = n ∗
N −1
Note:
The square root of the variance of the sampling distribution
is called the standard error of the mean or, simply, the
standard error.
When sampling is from an infinite population, the standard
errors under both sampling with and without replacement
will close each other.

96 / 144
When sampling is from a normally distributed population,
the distribution of the sample mean will also be normal
with mean µ and variance σ 2 /n. That means,

X ∼ N (µ, σ 2 ) => X̄ ∼ N (µ, σ 2 /n)

X̄ − µ
=> Z = √ ∼ N (0, 1)
σ/ n

Example: If the uric acid values in normal adult males are


approximately normally distributed with mean 5.7 mgs and
standard deviation 1mg, find the probability that a sample
of size 9 will yield a mean: (1) greater that 6; (2) between 5
and 6 (3) less than 5.2.
Solution: Let X is the amount of uric acid in normal adult
males. Thus, X ∼ N (µ, σ 2 ) = N (5.7, 1).

97 / 144
X̄ − µ 6 − 5.7
1 P (X̄ > 6) = P ( √ > √ ) = P (Z > 0.9) = 0.1841
σ/ n 1/ 9
5 − 5.7 X̄ − µ 6 − 5.7
2 P (5 < X̄ < 6) = P ( √ < √ < √ )=
1/ 9 σ/ n 1/ 9
P (−2.1 < Z < 0.9) = 0.7981
X̄ − µ 5.2 − 5.7
3 P (X̄ > 5.2) = P ( √ < √ ) = P (Z < −1.5) =
σ/ n 1/ 9
0.0668
Central Limit Theorem
Given a population of any nonnormal functional form with
a mean and finite variance, the sampling distribution of X̄
computed from samples of size n from this population, will
have mean µ and variance σ 2 /n and will be approximately
normally distributed when the sample size is large. i.e,

If n is large then X̄ ∼ N (µ, σ 2 /n)


98 / 144
Example: If the mean and standard deviation of serum
iron values for healthy men are 120 and 15 micro-grams per
100 ml, respectively, what is the probability that a random
sample of 50 normal men will yield a mean between 115
and 125 micro-grams per 100 ml?
Solution: The functional form of the population of serum
iron values is not specified, but since we have a sample size
greater than 30, we make use of the central limit theorem.
115 − 120 X̄ − µ
Thus, P (115 < X̄ < 125) = P ( √ < √ <
1/ 50 σ/ n
125 − 120
√ ) = P (−2.36 < Z < 2.36) = 0.9818
1/ 50

99 / 144

You might also like