Chapters 5 and 6. Continuous Random Variables and The Normal Distribution Lecture Calculator Key
Chapters 5 and 6. Continuous Random Variables and The Normal Distribution Lecture Calculator Key
Learning Outcomes:
We are going to shift away from discrete random variables and towards continuous
random variables. In the last chapter, we focused on tables and the special case of the
binomial distribution. We will turn our focus to the Uniform Distribution, the Standard
Normal Distribution, and the Normal Distribution. There are many other continuous
distributions out there in the real world. These are the three distributions we will study
in this course.
The probability that x falls in any particular interval is the area under the density
curve and above the interval.
• The two most common density curve are the uniform distribution and the
normal distribution.
The Uniform Distribution
A uniform distribution, sometimes also known as a rectangular distribution, is a
distribution that has constant probability.
Ex1 Define a random variable by X = amount of time (in minutes) taken by a clerk to
process a certain type of application form. Suppose X is uniformly distributed between
4 and 6 minutes.
When you’re dealing with the Uniform Distribution, you’ll want to find the base of your
rectangle. We do this by finding the range. Recall from Chapter 1 that the range is the
difference of your maximum data value and your minimum data value. In a Uniform
Distribution, they are referred to as a and b. In our problem a = 4 and b = 6.
When we’re dealing with the area of a rectangle, we know the formula from back in our
Geometry days: Area = b ⋅ h.
If you’re keeping track, we know the Area is 1, and we know the base is 2. We can plug
those number into the equation and solve for the height.
P (X )
Area=b⋅h
1=2⋅h
1
1 = 0.5
⇒h= 2
2
4 6 x
Processing Time (minutes)
As stated above, the height is always the reciprocal of the base. Or we could say
1
h= .
b−a
(a) Calculate P (4.5 < x < 5.5). P (X )
1 1
= ⋅
2 2
= 0.25
∴ P (5 ≤ x < 7) = 0.50
Note: It’s completely OK that we “ran out of rectangle” after 6 minutes. Yes, the number
in the probability statement asks us to go out to 7 minutes. But, this clerk never takes
that long. The probability would have been the same regardless of how large the upper
bound was (pending it was larger than 6), i.e. P (5 ≤ x < 700) = 0.50.
Ex2 The following graph shows the uniform distribution of wait times, in minutes, for the
Catbus at the bus stop in front of Sikes Hall. Find the area of the shaded region.
base = b − a = 14 − 1 = 13 minutes
Next, find the height of your rectangle by taking the reciprocal of the base.
1 1 1
h= = =
b−a 14 − 1 13
P (x)
Here is our PDF: 1
= 0.077
13
First, calculate the base of your shaded rectangle, i.e. the blue rectangle.
base = 6 − 3 = 3 minutes
Second, use Area = b ⋅ h to calculate the probability.
P (3 ≤ x ≤ 6) = b ⋅ h
1
=3⋅
13
= 0.2307
Note: If you use the decimal approximation of 0.077, your answer will differ slightly.
P (3 ≤ x ≤ 6) = b ⋅ h
= 3 ⋅ 0.077
= 0.231
Properties of the Uniform Distribution
X ∼ U (a, b), where a = the lowest value of x and b = the highest value of x.
(b − a)
2
a+b
μ= , σ=
2 12
Ex3 The amount of time, in minutes, that a person must wait for a bus is uniformly
distributed between 0 and 15 minutes, inclusive.
1 1 1
base = b − a = 15 − 0 = 15 minutes ⇒ h = = =
b−a 15 − 0 15
P (X )
1
= 0.067
15
0 15 x
Wait Time (minutes)
a+b 0 + 15
(c) μ= = = 7.5 minutes
2 2
(b − a) (15 − 0)
2 2
(f) What is the probability that a person waits at least 7.3 minutes?
P (X )
P (x > 7.3) = base ⋅ height
1
= (15 − 7.3) ⋅ 1
15 15
= 0.067
1
= 7.7 ⋅
15
= 0.513 x
0 7.3 15
Wait Time (minutes)
Note: When we read the phrase “at least”, we can swap that out with the symbol ≥.
Continuing on that thought train, the phrase “at most” is synonymous with the symbol ≤.
Some can find this a little counterintuitive because the phrase “at least” matches up with
the “greater than” symbol, and the phrase “at most” matches up with the “less than”
symbol.
(g) What is the probability that a person waits longer than 10 minutes given they have
waited longer than 5 minutes?
This is a conditional probability problem. We will use the conditional probability formula
P (A AND B)
from Chapter 3 - P (A ∣ B) = .
P (B)
(10 − 5) ⋅
1 Overlap, i.e. “AND”
8 x > 10
=
(15 − 5) ⋅
1
8
1 x >5
5⋅ 15
= 1
10 ⋅ 15
5
=
10
= 0.5
(h) Ninety percent of the time, the time a person must wait falls below what value?
1
= 0.067
15
90%
0
base = 13.5
13.5 15 x
Wait Time (minutes)
1
= 0.067
15
75%
0
base = 11.25 11.25 15 x
Wait Time (minutes)
Note: Don’t forget your units when solving for a value of the variable, i.e. a percentile
problem.
Review of Normal Distributions
Recall the normal distribution curve is bell-shaped, symmetric, and has an infinite base.
We will discuss the normal curve forever in statistics. It’s HUGE. This is just the
beginning.
The normal distribution has two parameters of interest - its mean, µ , and its standard
deviation, σ . When a continuous random variable can be approximated with a normal
density curve we write X ∼ N (μ, σ).
Ex4 The following graph summarizes the data collected on annual rainfall, in inches, in
two cities for the past 150 years. Which of the following conclusions can be made from
this graph?
City A: City B:
S - Skewed Right S - ~N
O - ?? O - ??
C - ~17 inches C - ~30 inches
S - 5 to 40 inches S - 20 to 40 inches
(a) The cities have different mean annual rainfalls, but the range of their annual rainfalls
is approximately the same.
(b) On average, City B gets more rain than city A, but has a smaller range of annual
rainfall.
(c) On average, City B gets more rain than city A, but has a larger range of annual
rainfall.
(d) On average, City A gets more rain than city B, but has a smaller range of annual
rainfall.
(e) On average, City A gets more rain than city B, but has a larger range of annual
rainfall.
z scores allow us to compare data values
from different distribution by placing
The z score corresponding to a particular value is them on the same distribution, i.e. the
Standard Normal Distribution.
value - mean
z score =
standard deviation
The z score tells us how many standard deviations our value is from the mean. It is
positive or negative according to whether the value lies above or below the mean.
Ex5 The average cost per ounce for glass cleaner is 7.7 cents with a standard deviation
of 2.5 cents. What is the z score of Windex with a cost of 10.1 cents per ounce?
(a) 0.96
(b) 1.31 value - mean 10.1 − 7.7 cents
z score = = = 0.96
(c) 1.94 standard deviation 2.5 cents
(d) 4.04
Ex6 A student took two national aptitude tests in the course of applying for admission to
colleges. The national average and standard deviation were 475 and 100 respectively,
for the first test and 30 and 8, respectively, for the second test. The student scored 625
on the first test and 45 on the second test. Use z scores to determine on which exam
the student performed better.
625-475 45-30
zT1 score = = 1.5 zT2 score = = 1.875
100 8
The student scores better on the 2nd exam because that z score is higher.
z ∼ N (0, 1)
x−μ
z=
σ
z
Calculations of Probabilities for the Standard Normal Distribution
Ex7 Calculate the following probabilities with your calculator. Draw a picture for each
probability.
(a) P (−2
≤ z ≤ 2) (b) P (z
< − 1.76)
= normalcdf (−2, 2, 0, 1) = normalcdf (−1E 99, −1.76, 0, 1)
= 0.954 = 0.039
P (z) P (z)
z z
−2 0 2 −1.76 0
(c) P (z
> 1.18) (d) P (z
< 1.18)
= normalcdf (1.18, 1E 99, 0, 1) = normalcdf (−1E 99, 1.18, 0, 1)
= 0.119 = 0.881
P (z) P (z)
z z
0 1.18 0 1.18
(e) P (z
≥ 1.96) (f) P (z
= 1.23)
= normalcdf (1.96, 1E 99, 0, 1) = normalcdf (1.23, 1.23, 0, 1)
= 0.025 =0
P (z) P (z)
z z
0 1.96 0 1.23
Calculator: 2nd → Vars → 2 → lower bound → , → upper bound → , → 0 → , → 1
Display: normalcdf(lower bound, upper bound, 0, 1)
Note #2: As we saw in part (f), the probability that z is equal to any one number is zero,
as there is no “area” under the curve. There is no ‘width’ along the z-axis. From this,
we get the following properties.
This is only true for continuous numerical variables. Back in discrete land (Chapter 4),
the symbols < and > can have different probabilities than ≤ and ≥.
The Normal Distribution aka The Bell Curve
Note: The Normal Distribution is a special bell curve but by no means the only bell
curve. It’s just the most popular bell curve out there.
X ∼ N (μ, σ)
μ − 3σ μ − 2σ μ − σ μ μ+σ μ + 2σ μ + 3σ
(a) Sketch the pdf. (b) Calculate the z-score for an x-value
Label and scale your axes. of 17.
P (X )
value - mean
z score =
standard deviation
17 − 5
=
6
x =2
-13 -7 -1 5 11 17 23
x=1 z =2
2
z=−
3
(c) What does a positive z-score mean? (d) Calculate the z-score for x = 1?
value - mean
z score =
standard deviation
1−5
=
The data value is above the mean. 6
2
=−
3
= − 0.67
The Empirical Rule (or the 68 – 95 – 99.7 Rule)
If the histogram of values in a data set can be reasonably well approximated by a
normal curve, then...
Another graphical view of this rule looks at the percentages on each side of the mean ...
x
z
You can see the 34% on each side of the mean, one standard deviation away from the
mean, the area under the curve totals to 68%. If you travel two standard deviations
away from the mean, the total area under the curve is 95%, three standard deviations in
either directions totals to 99.7% of the area under the curve.
You can also see the equivalent z scores across the bottom of the curve.
Let’s take a look at this breakdown of the normal distribution from a different light. The
Empirical Rule deals with “middles” - as in the middle 68% of observations, the middle
95% of observations, and the middle 99.7% of observations.
The folks over at Texas Instruments, the same folks that produce your calculator don’t
work in “middles”. They only work in percentiles. We mentioned percentiles back in
Chapter 2. They’re synonymous with cumulative relative frequencies and you can look
for them on graphs in an “from here on down” or “from here to the left” light.
Let’s practice taking our “middles” and converting them to “on downs”, i.e. percentiles.
The easiest percentile to see is the median, which as we learned from Chapter 2 is the
50th percentile. The 50th percentile always falls exactly on the mean. If you add the
percentages from the mean “on down”, they will total 50%, i.e.
34% + 13.5% + 2.35% + 0.15 % = 50 % .
x
z
50th %ile
50%
x
z
Let’s look at the percentile when a data value is exactly one standard deviation above
the mean. If you add the percentages from the z = 1 “on down”, they will total 84%, i.e.
34% + 34% + 13.5% + 2.35% + 0.15 % = 84 % .
x
z
84th %ile
84%
x
z
We can continue with this process and find percentiles for any of the z-scores.
50th %ile
16th %ile 84th %ile
x
z
Keep in mind that your calculator will only deal with percentiles. If you are given a
“middle” percentage, you will need to covert it to a percentile, i.e. an “on down”
percentage.
There will be times when you are asked to find the “top x%”, for example the “top 20%”
of all scores on a test. You will need to convert the “top 20%” to a percentile. The “top
20%” is cutoff from the “bottom 80%”, i.e. the 80th percentile. We are using the
complement rule here.
Ex9 In a study investigating the effect of car speed on accident severity, 5000 reports of
fatal automobile accidents were examined, and the vehicle speed at impact was
recorded for each one. It was determined that the average speed was 42 mph and that
the standard deviation was 15 mph. In addition, a histogram revealed that vehicle
speed at impact could be described by a normal curve.
(a) Roughly what proportion of vehicle speeds were between 27 and 57mph?
Sketch a graph of this situation. Shade the area of interest on the PDF.
X ∼ N (42, 15)
P (x)
x̄ ± 1s = 42 ± 15 34% 34%
⇒ 42 − 15 = 27 x
27 57
42 + 15 = 57 42
Speed (mph)
(27, 57)
From the Empirical Rule we know that roughly 68% of observations fall within 1
standard deviation of the mean, 34% on either side of the mean.
(b) Roughly what proportion of vehicle speeds exceed 57mph? Sketch a graph of this
situation. Shade the area of interest on the PDF.
P (x)
From part (a), we saw 68% of observations were
within a standard deviation of the mean. Using the
complement rule, we know that 32% of observations 34% 34%
P (x)
Note: You could also recognize that 57mph is the
84th percentile, as the data value is 1 standard 16%
Speed (mph)
Ex10 In a certain southwestern city the air pollution index averages 62.5 during the
year with a standard deviation of 18.0. Assuming the empirical rule is appropriate, the
index falls within what interval 95% of the time? P (x)
From the Empirical Rule we know that roughly 95% of observations fall within 2
standard deviation of the mean.
X ∼ N (62.5, 18)
x̄ ± 2s = 62.5 ± 2 ⋅ 18
∴ (26.5, 98.5)
= 62.5 ± 36
⇒ 62.5 − 36 = 26.5
62.5 + 36 = 98.5
Ex11 Consider a normal distribution of frog weights with μ = 500 grams and σ = 65
grams. A sample of size 2,000 is drawn from this population. Approximately how many
of the cases would you expect to find between 435 and 565? P (x)
(a) 1,000
(b) 1,360
(c) 1,500
(d) 1,900 x
435 565
500
Frog Weights (g)
From the Empirical Rule we know that roughly 68% of observations fall within 2
standard deviation of the mean.
X ∼ N (500, 65)
x̄ ± s = 500 ± 65
⇒ 500 − 65 = 435
∴ (435, 565)
500 + 65 = 565
If 68% of the 2000 observations fall within 435 and 565 grams, we have
0.68 ⋅ 2000 = 1360 frogs.
Calculations of Probabilities for any Normal Distribution
Ex12 The growth of children can be an important indicator of general levels of nutrition
and health. Data suggest that a reasonable model for the probability distribution of the
continuous numerical variable x = height of a randomly selected 5-year old child is a
normal distribution with mean µ = 100 cm and standard deviation σ = 6 cm. Sketch
a picture and shade the appropriate region.
P (x)
X ∼ N (100, 6)
What proportion of the heights is between 94 and 112 cm? 34% 34%
We can use the Empirical Rule here because each of the 13.5%
x
94 112
100
Height (cm)
Note: This answer is actually more accurate than when using the Empirical Rule. The
Empirical Rule is an estimation.
What is the probability that a randomly chosen child will be at least 110 cm tall?
value - mean
z score110 =
standard deviation
x
110 − 100 ¯\_(ツ)_/¯ 100 110
=
6 Height (cm)
= 1.67
______________________________________________________________________
Calculator: 2nd → Vars → 2 → lower bound → , → upper bound →, → mean →, → s.d.
Display: normalcdf(lower bound, upper bound, mean, s.d)
Note: −∞ = − 1E 99, ∞ = 1E 99
______________________________________________________________________
P (x)
Find the 84th percentile.
x
106
value - mean 100
1= Height (cm)
standard deviation
x − 100 ⇒ x = 106 cm
1=
6
6 = x − 100
Note: This answer is actually more accurate than when using the Empirical Rule. The
Empirical Rule is an estimation.
P (x)
Find the 40th percentile.
Note: We should expect the 84th percentile to have a value larger than the mean, as
the mean is always the 50th percentile. We saw that happen in this problem as the 84th
percentile was ~106 cm, which is larger than the mean of 100 cm. Along those same
lines, we can expect the 40th percentile to have a value smaller than the mean. Again,
we saw that happen in this problem as the 40th percentile was ~98.5 cm, which is
smaller than the mean.
______________________________________________________________________
Calculator: 2nd → Vars → 3 → percentile → , → mean →, → s.d.
Display: invNorm(percentile, mean, s.d.)
https://www.youtube.com/watch?v=HKzZwX7oeDM
https://www.youtube.com/watch?v=UuKxBnIGyJQ