0% found this document useful (0 votes)
7 views

Topic 5 Concept

The document discusses continuous sample spaces and probability distributions, emphasizing the use of density curves to model these distributions. It explains how probabilities are calculated as areas under these curves, particularly in the context of uniform and normal distributions. Additionally, it covers concepts such as the sampling distribution of statistics and the central limit theorem, highlighting the importance of sample size in achieving normality in distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Topic 5 Concept

The document discusses continuous sample spaces and probability distributions, emphasizing the use of density curves to model these distributions. It explains how probabilities are calculated as areas under these curves, particularly in the context of uniform and normal distributions. Additionally, it covers concepts such as the sampling distribution of statistics and the central limit theorem, highlighting the importance of sample size in achieving normality in distributions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Continuous sample spaces contain an infinite number of outcomes over an

interval of values. The total area under a density curve represents the whole population

Using density curves


(sample space) and always equals 1 (100%).

Density curves
image: NASA
We use mathematical functions
called density curves to model
Probability distributions
Events are defined over intervals of values.
continuous probability distributions.

Continuous random variables Probability are computed as areas under the


corresponding portion of the density curve.

Copyright Dr. Brigitte Baldi ©

Uniform distributions are flat density curves defined over a bounded interval.
Ex: Uniform distributions

Software generates at random, with We found


uniform probability, a value x between 0 and 1.
total area = total width * total height = 1 P(0.1 ≤ x ≤ 0.5) = event width * total height

total height = 1

total height = 1
= (0.5 – 0.1) * 1 = 0.4
What is the probability of getting a value x
between 0.1 and 0.5?
x x
total height

P(0.1 ≤ x ≤ 0.5) = event width * total height 0 0.1 0.5 1 0 0.1 0.5 1
total height

= (0.5 – 0.1) * 1 = 0.4 total width = 1 The probability P(0.1 < x < 0.5) is total width = 1

A. smaller than the probability P(0.1 ≤ x ≤ 0.5).


total width event width B. equal to the probability P(0.1 ≤ x ≤ 0.5).
By the rule of the complement, there is probability 0.6 of getting a value that
is NOT between 0.1 and 0.5. C. greater than the probability P(0.1 ≤ x ≤ 0.5).
Probability of some event = rectangular area
= event width * total height
Outcomes as intervals

When a random variable has a continuous probability distribution, any


Ask, what is P(x = 0.1)?
singular value is essentially meaningless (one out of an infinite number of
total height = 1

P(x = 0.1) = event width * total height nearly indistinguishable values).


= (0.1 – 0.1) * 1 = 0 image: NASA

Events must be defined as intervals,

And x = 0.1 is one out of an infinite number of


0 0.1
total width = 1
1
x
range of values, to be meaningful. Probability distributions
equally likely values in the sample space (0 to 1).
We use a math approach to calculate Normal distributions
P(x = 0.1) = 1 / ∞ = 0 probabilities as areas under density
curves.
If x is continuous, then range of values

P(0.1 < x < 0.5) = P(0.1 ≤ x ≤ 0.5)


Copyright Dr. Brigitte Baldi ©
18
2
1  x 
A family of density curves: 1    16
N (  ,  ) : f ( x)  e 2  
A Normal distribution is a reasonable model
Normal distributions (Gaussian)  2 14
of US women’s heights

Models for real data


 = 15  12

• defined from –∞ to +∞ 10

Percent
 = 2, 4, and 6
Math properties

• total area = 1 (mostly near the peak) 8

• symmetric around their mean μ (mu)


6

4
• scaled by their standard deviation σ sigma)
2

• notation: N μ, σ 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0

under 56

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72 or more
• inflection points at μ σ
Inflection point Height (inches)

A Normal distribution would be a terrible model


 = 10, 15, and 20
of US household incomes
=3
μ–σ μ σ
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Middle area:
For any Normal curve N(µ,σ): About 68% of all observations For any Normal curve N(µ,σ): About 68% of all observations
under N(µ,σ) are within µ ± σ. under N(µ,σ) are within µ ± σ.

About 95% of all observations About 95% of all observations


under N(µ,σ) are within µ ± 2σ. under N(µ,σ) are within µ ± 2σ.
Scaling

Scaling
Almost all (99.7%) observations Almost all (99.7%) observations
under N(µ,σ) are within µ ± 3σ. under N(µ,σ) are within µ ± 3σ.

2
1  x 
1   
N (  ,  ) : f ( x)  e 2  

 2
Number of times σ from µ Number of times σ from µ

The standard Normal curve


The distribution of heights (in inches) of adult women in the US The distribution of heights (in inches) of adult women in the US
can be modeled with the N(64.5, 2.5) distribution. can be modeled with the N(64.5, 2.5) distribution.
N(0,1)

The probability that a randomly The probability that a randomly The standard Normal distribution is N(0,1).
selected US adult woman has a selected US adult woman has a
height less than 69.5 inches is height less than 69.5 inches is 𝑧
approximately approximately

A) 99.7% A) 99.7% (x   )
B) 97.5% B) 97.5% We can standardize the 𝑥 variable by computing z 

C) 95% C) 95%
D) 84% ~0.025 ~0.025 D) 84% If 𝑥 has the N(𝜇,𝜎 ) distribution, then 𝑧 has the N(0,1) distribution.
x x
E) 68% E) 68%
2
1  x 
1   
N (  ,  ) : f ( x)  e 2  

 2
(x   )
z Precise probability calculations with Normal distributions require technology.

Computing a probability
N(64.5, 2.5) N(0,1)

TI 83/84
=> 2nd DISTR
image: NASA normalcdf(lower, upper, μ, σ)

x Standardized height (no units)


z Probability distributions
If x = 62 inches, then z = (62 - 64.5)/2.5 = –1 Computations with Normal distributions
In the United States, a woman with a height of 62 inches is one standard Use –1E99 and 1E99 for –∞ and +∞
deviation below the mean in the distribution of women’s heights. To get “E” type “2nd” then “EE”)

Copyright Dr. Brigitte Baldi ©

Normal The distribution of heights (in inches) of adult women in the US The distribution of heights (in inches) of adult women in the US
probability can be modeled with the N(64.5, 2.5) distribution. can be modeled with the N(64.5, 2.5) distribution.
computations:

CrunchIt What is the probability that The probability that a randomly


a randomly selected US selected US adult woman has a
0.9772
adult woman has a height height less than 69.5 inches is
less than 69.5 inches? approximately 0.9772.

x x

http://crunchit3.bfwpub.com/psls4e

Inverse The distribution of heights (in inches) of adult women in the US


Inversely, you can use technology to find a percentile value – the value of x Normal can be modeled with the N(64.5, 2.5) distribution.
such that a given percent of the distribution lies below it. computations:
Finding a percentile

TI 83/84 CrunchIt
2nd DISTR What is the 90th percentile
of heights among US
p
Cumulative adult women?
area p
x x = invNorm(p, μ, σ)

To find the value of x such that there is ?


x
area p to the left of x.
 x is the pth percentile
This x value is the 100xpth percentile
http://crunchit3.bfwpub.com/psls4e
The distribution of heights (in inches) of adult women in the US The distribution of heights (in inches) of adult women in the US
can be modeled with the N(64.5, 2.5) distribution. a and P(x ≤ a) can be modeled with the N(64.5, 2.5) distribution.
are linked. You
may need to
The 90th percentile of heights find either one.
among US adult women is
67.7 inches.

67.7 x P(x ≤ a) is the blue area, the area under the density curve P(x ≤ 69.5 inches) = 0.9772
for values less than or equal to a.
90th percentile = 67.70 inches
90% of US adult women are shorter than 67.7 inches. a is the threshold value splitting the density curve into the
10% of US adult women are taller than 67.7 inches. two complementary areas, P(x ≤ a) in blue and P(x > a).

Target population

Statistics as random variables


Imagine a huge batch of

Distribution of the statistic


M&M candies. You plan to
pick n=10 at random and
carefully study them.

Probability distribution (density


curve) of the values of a statistic
image: NASA
summarizing all random samples of
n observations from this population

Probability distributions A statistic computed from a random sample is a random variable.

Behavior of random samples: sampling distributions


Every random sample is different, with a different summary statistic value. The probability distribution of a statistic for all random samples of n
observations from a given population is called the sampling distribution
 When the ‘individuals’ studied are random samples of size n, then the of the statistic.
sample statistic is a random variable.
Copyright Dr. Brigitte Baldi ©

CORE IDEA OF STATISTICS: When the sample is random and representative, Population distribution of the quantitative variable 𝒙: refers to the
probability distribution of a random variable (the distribution of the value of
the value of the sample statistic should be pretty close to the value of the
Critical distinctions
Why it is important

the variable for all the individuals in the population). MATH MODEL
population parameter.

Sample distribution, or distribution of the sample: describing a histogram


Seems logical, but:
or a dotplot of one random sample of n data points. ACTUAL DATA
1. Can we prove that it’s true, mathematically?

2. How close is “pretty close”? Can we quantify that? Sampling distribution of the sample mean 𝒙: refers to the theoretical
probability distribution of the mean 𝑥̅ , of all possible samples of size n from a
given population. MATH MODEL
The sampling distribution of the statistic addresses these questions.

[There are other statistics than 𝒙 and they have their own sampling distribution, which we won’t study.]
Consider a population (the variable 𝑥) with mean 𝜇 and standard deviation 𝜎.

Unbiased estimate
The sampling distribution of the sample mean (the variable 𝑥̅ ) has mean :

𝜇 𝜇
image: NASA

̄
Probability distributions
The sampling distribution of the statistic 𝑥̅ 𝑥̅ is an unbiased estimate the population mean 𝜇
Density curve:
Sampling distribution (𝑥̅ correctly estimates 𝜇, on average)
of the mean

Copyright Dr. Brigitte Baldi ©

Consider a population (the variable 𝑥) with mean 𝜇 and standard deviation 𝜎.


The central limit theorem states that, for large enough sample sizes 𝑛

Central limit theorem


(depending on the shape of the population), the sampling distribution of 𝒙
The sampling distribution of the sample mean (the variable 𝑥̅ ) has standard
Variability of 𝑥̅

deviation: is approximately Normal.

𝜎 ̄ 𝜎/ 𝑛 Very rough guidelines:

A sample size of 10+: good enough for most symmetric data distributions

A sample size of 25+: good enough for many situations (no extreme skew)
Sample means vary much less than individual observations A sample size of 30-40+: usually good enough even for extreme skews
𝜇̄ 𝜇
Larger samples tend to give closer estimates of 𝜇
𝜎̄ 𝜎/ 𝑛

3 different
populations

Corresponding
sampling
distributions
for various
sample sizes

𝜇 ̄ 𝜇
Shape of sampling
𝜎 ̄ 𝜎/ 𝑛 distribution of 𝒙 depends
on central limit theorem

image: NASA

You might also like