0% found this document useful (0 votes)
17 views

Chapters 5 and 6. Continuous Random Variables and The Normal Distribution Lecture Calculator Key

The document discusses key concepts related to continuous random variables and the normal distribution. It introduces probability density functions and continuous random variables. It then covers the uniform distribution, including defining it, finding probabilities, and its properties. It also introduces the standard normal distribution and normal distribution.

Uploaded by

Ran Man
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Chapters 5 and 6. Continuous Random Variables and The Normal Distribution Lecture Calculator Key

The document discusses key concepts related to continuous random variables and the normal distribution. It introduces probability density functions and continuous random variables. It then covers the uniform distribution, including defining it, finding probabilities, and its properties. It also introduces the standard normal distribution and normal distribution.

Uploaded by

Ran Man
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Ms Abrao

Math 43 Introduction to Probability and Statistics


Chapters 5 and 6. Continuous Random Variables and The Normal Distribution Key
HW pp. 353 - 355 #75 (ignore part (c)), 76 (ignore part (d)), 79 - 81, 85
pp. 389 - 391 #60, 61, 65, 67 - 75, 77, 79, 80
Note: Do not attempt pp. 355 - 357 #86 - 101. Do not attempt pp. 389 - 393 #62, 85, and 87. The
material these homework problems are based on is not covered in the scope of this class.

Learning Outcomes:

• Recognize and understand continuous probability density functions in general.


• Recognize the uniform probability distribution and apply it appropriately.
• Recognize the standard normal probability distribution and apply it appropriately.
• Recognize the normal probability distribution and apply it appropriately.

We are going to shift away from discrete random variables and towards continuous
random variables. In the last chapter, we focused on tables and the special case of the
binomial distribution. We will turn our focus to the Uniform Distribution, the Standard
Normal Distribution, and the Normal Distribution. There are many other continuous
distributions out there in the real world. These are the three distributions we will study
in this course.

Continuous Probability Functions


• A probability distribution for a continuous random variable x is specified by a
mathematical function called the probability density function (pdf). The
following requirements must be met:

1. The curve cannot dip below the horizontal axis.


2. The total area under the density curve is equal to 1.

The probability that x falls in any particular interval is the area under the density
curve and above the interval.

• PROBABILITY = AREA UNDER THE DENSITY CURVE

• The two most common density curve are the uniform distribution and the
normal distribution.
The Uniform Distribution
A uniform distribution, sometimes also known as a rectangular distribution, is a
distribution that has constant probability.

Ex1 Define a random variable by X = amount of time (in minutes) taken by a clerk to
process a certain type of application form. Suppose X is uniformly distributed between
4 and 6 minutes.

Is X continuous or discrete? Continuous

When you’re dealing with the Uniform Distribution, you’ll want to find the base of your
rectangle. We do this by finding the range. Recall from Chapter 1 that the range is the
difference of your maximum data value and your minimum data value. In a Uniform
Distribution, they are referred to as a and b. In our problem a = 4 and b = 6.

In our example, we have: base = b − a = 6 − 4 = 2 minutes.


Once you’ve found your base, the next task is to find the height of the rectangle, i.e. the
probability. The height will always be the reciprocal of the base. Let’s take a look at
why this is true.

Look at the second property of a continuous pdf:

2. The total area under the density curve is equal to 1.

When we’re dealing with the area of a rectangle, we know the formula from back in our
Geometry days: Area = b ⋅ h.

If you’re keeping track, we know the Area is 1, and we know the base is 2. We can plug
those number into the equation and solve for the height.
P (X )

Area=b⋅h
1=2⋅h
1
1 = 0.5
⇒h= 2
2

4 6 x
Processing Time (minutes)

As stated above, the height is always the reciprocal of the base. Or we could say
1
h= .
b−a
(a) Calculate P (4.5 < x < 5.5). P (X )

First, calculate the base of your shaded


rectangle, i.e. the pink-ish rectangle. 1
= 0.5
2
base = 5.5 − 4.5 = 1 minute
Second, use Area = b ⋅ h to calculate the
probability. 4 4.5 5.5 6 x
Processing Time (minutes)
P (4.5 < x < 5.5) = b ⋅ h
1
=1⋅
2
= 0.50

∴ P (4.5 < x < 5.5) = 0.50

(b) Calculate P (x > 5.5).


P (X )

First, calculate the base of your shaded


rectangle, i.e. the pink-ish rectangle.
1
= 0.5
base = 6 − 5.5 = 0.5 minute 2

Second, use Area = b ⋅ h to calculate the


probability.
4 5.5 6 x
P (x > 5.5) = b ⋅ h Processing Time (minutes)

1 1
= ⋅
2 2
= 0.25

∴ P (x > 5.5) = 0.25


(c) Calculate P (5 ≤ x < 7). P (X )

First, calculate the base of your shaded


rectangle, i.e. the pink-ish rectangle. 1
= 0.5
2
base = 6 − 5 = 1 minute
Second, use Area = b ⋅ h to calculate the
4 6 x
probability. 5
Processing Time (minutes)
P (5 ≤ x < 7) = b ⋅ h
1
=1⋅
2
= 0.50

∴ P (5 ≤ x < 7) = 0.50

Note: It’s completely OK that we “ran out of rectangle” after 6 minutes. Yes, the number
in the probability statement asks us to go out to 7 minutes. But, this clerk never takes
that long. The probability would have been the same regardless of how large the upper
bound was (pending it was larger than 6), i.e. P (5 ≤ x < 700) = 0.50.
Ex2 The following graph shows the uniform distribution of wait times, in minutes, for the
Catbus at the bus stop in front of Sikes Hall. Find the area of the shaded region.

(a) -0.23 ←(a) & (b) are not possible…


(b) -0.21 all probabilities are numbers
(c) 0.21 between 0 & 1. They cannot
(d) 0.23 be negative.
(e) 0.30

First, calculate the base of your rectangle.

base = b − a = 14 − 1 = 13 minutes
Next, find the height of your rectangle by taking the reciprocal of the base.
1 1 1
h= = =
b−a 14 − 1 13

P (x)
Here is our PDF: 1
= 0.077
13

First, calculate the base of your shaded rectangle, i.e. the blue rectangle.

base = 6 − 3 = 3 minutes
Second, use Area = b ⋅ h to calculate the probability.
P (3 ≤ x ≤ 6) = b ⋅ h
1
=3⋅
13
= 0.2307
Note: If you use the decimal approximation of 0.077, your answer will differ slightly.

P (3 ≤ x ≤ 6) = b ⋅ h
= 3 ⋅ 0.077
= 0.231
Properties of the Uniform Distribution
X ∼ U (a, b), where a = the lowest value of x and b = the highest value of x.

(b − a)
2
a+b
μ= , σ=
2 12
Ex3 The amount of time, in minutes, that a person must wait for a bus is uniformly
distributed between 0 and 15 minutes, inclusive.

(a) X ∼ U (0, 15)

(b) Graph the probability distribution function.

1 1 1
base = b − a = 15 − 0 = 15 minutes ⇒ h = = =
b−a 15 − 0 15
P (X )

1
= 0.067
15

0 15 x
Wait Time (minutes)

a+b 0 + 15
(c) μ= = = 7.5 minutes
2 2

(b − a) (15 − 0)
2 2

(d) σ= = = 4.330 minutes


12 12
(e) What is the probability that a person waits fewer than 12.5 minutes? Shade the area
of interest on the PDF. P (X )

P (x < 12.5) = base ⋅ height


1
= (15 − 12.5) ⋅
1
= 0.067
15
15
1
= 2.5 ⋅
15
x
= 0.833 0 12.5 15
Wait Time (minutes)

(f) What is the probability that a person waits at least 7.3 minutes?
P (X )
P (x > 7.3) = base ⋅ height
1
= (15 − 7.3) ⋅ 1
15 15
= 0.067
1
= 7.7 ⋅
15
= 0.513 x
0 7.3 15
Wait Time (minutes)

Note: When we read the phrase “at least”, we can swap that out with the symbol ≥.
Continuing on that thought train, the phrase “at most” is synonymous with the symbol ≤.
Some can find this a little counterintuitive because the phrase “at least” matches up with
the “greater than” symbol, and the phrase “at most” matches up with the “less than”
symbol.
(g) What is the probability that a person waits longer than 10 minutes given they have
waited longer than 5 minutes?

This is a conditional probability problem. We will use the conditional probability formula
P (A AND B)
from Chapter 3 - P (A ∣ B) = .
P (B)

P (x > 10 AND x > 5) You might be wondering how I got from


P (x > 10 ∣ x > 5) = the 1st to the second line. We are
P (x > 5) looking for the overlap between
numbers greater than 10 AND numbers
P (x > 10) greater than 5. Any number greater
=
P (x > 5 )
than 10 is automatically greater than 5.
So the overlap is all of the numbers
greater than 10. We can also look at
base ⋅ height
= this graphically on numbers lines.
base ⋅ height

(10 − 5) ⋅
1 Overlap, i.e. “AND”
8 x > 10
=
(15 − 5) ⋅
1
8
1 x >5
5⋅ 15
= 1
10 ⋅ 15
5
=
10
= 0.5
(h) Ninety percent of the time, the time a person must wait falls below what value?

base ⋅ height = 0.90


1
base ⋅ = 0.90
15
⇒ base = 0.90 ⋅ 15 = 13.5
Our a value is 0. So our answer is 13.5 + 0 = 13.5. The 90th percentile is 13.5
minutes.
P (X )

1
= 0.067
15
90%

0
base = 13.5
13.5 15 x
Wait Time (minutes)

(i) Find the minimum for the upper quartile.

base ⋅ height = 0.75


1
base ⋅ = 0.75
15
⇒ base = 0.75 ⋅ 15 = 11.25
Our a value is 0. So our answer is 11.25 + 0 = 11.25. The 75th percentile is 11.25
minutes.
P (X )

1
= 0.067
15
75%

0
base = 11.25 11.25 15 x
Wait Time (minutes)

Note: Don’t forget your units when solving for a value of the variable, i.e. a percentile
problem.
Review of Normal Distributions
Recall the normal distribution curve is bell-shaped, symmetric, and has an infinite base.

We will discuss the normal curve forever in statistics. It’s HUGE. This is just the
beginning.

The normal distribution has two parameters of interest - its mean, µ , and its standard
deviation, σ . When a continuous random variable can be approximated with a normal
density curve we write X ∼ N (μ, σ).

Ex4 The following graph summarizes the data collected on annual rainfall, in inches, in
two cities for the past 150 years. Which of the following conclusions can be made from
this graph?

City A: City B:
S - Skewed Right S - ~N
O - ?? O - ??
C - ~17 inches C - ~30 inches
S - 5 to 40 inches S - 20 to 40 inches

(a) The cities have different mean annual rainfalls, but the range of their annual rainfalls
is approximately the same.
(b) On average, City B gets more rain than city A, but has a smaller range of annual
rainfall.
(c) On average, City B gets more rain than city A, but has a larger range of annual
rainfall.
(d) On average, City A gets more rain than city B, but has a smaller range of annual
rainfall.
(e) On average, City A gets more rain than city B, but has a larger range of annual
rainfall.
z scores allow us to compare data values
from different distribution by placing
The z score corresponding to a particular value is them on the same distribution, i.e. the
Standard Normal Distribution.
value - mean
z score =
standard deviation

The z score tells us how many standard deviations our value is from the mean. It is
positive or negative according to whether the value lies above or below the mean.

Ex5 The average cost per ounce for glass cleaner is 7.7 cents with a standard deviation
of 2.5 cents. What is the z score of Windex with a cost of 10.1 cents per ounce?

(a) 0.96
(b) 1.31 value - mean 10.1 − 7.7 cents
z score = = = 0.96
(c) 1.94 standard deviation 2.5 cents
(d) 4.04

Ex6 A student took two national aptitude tests in the course of applying for admission to
colleges. The national average and standard deviation were 475 and 100 respectively,
for the first test and 30 and 8, respectively, for the second test. The student scored 625
on the first test and 45 on the second test. Use z scores to determine on which exam
the student performed better.

625-475 45-30
zT1 score = = 1.5 zT2 score = = 1.875
100 8

The student scores better on the 2nd exam because that z score is higher.

The Standard Normal Distribution


The standard normal distribution is a normal distribution of standardized values
called z-scores. It is centered at zero with a standard deviation of 1.

z ∼ N (0, 1)
x−μ
z=
σ

z
Calculations of Probabilities for the Standard Normal Distribution
Ex7 Calculate the following probabilities with your calculator. Draw a picture for each
probability.

(a) P (−2
≤ z ≤ 2) (b) P (z
< − 1.76)
= normalcdf (−2, 2, 0, 1) = normalcdf (−1E 99, −1.76, 0, 1)
= 0.954 = 0.039
P (z) P (z)

z z
−2 0 2 −1.76 0

(c) P (z
> 1.18) (d) P (z
< 1.18)
= normalcdf (1.18, 1E 99, 0, 1) = normalcdf (−1E 99, 1.18, 0, 1)
= 0.119 = 0.881
P (z) P (z)

z z
0 1.18 0 1.18

(e) P (z
≥ 1.96) (f) P (z
= 1.23)
= normalcdf (1.96, 1E 99, 0, 1) = normalcdf (1.23, 1.23, 0, 1)
= 0.025 =0
P (z) P (z)

z z
0 1.96 0 1.23
Calculator: 2nd → Vars → 2 → lower bound → , → upper bound → , → 0 → , → 1
Display: normalcdf(lower bound, upper bound, 0, 1)

Note #1: −∞ = − 1E 99, ∞ = 1E 99. Always use


EE is the correct
this notation on your TI-84. You might run into a
button for scientific
case where your variable would never go below zero notation on the
or perhaps never go above 1000. It doesn’t matter. TI-83/84 calculators.
Always use −∞ = − 1E 99 or ∞ = 1E 99 when
you have a < or > probability to calculate.

Note #2: As we saw in part (f), the probability that z is equal to any one number is zero,
as there is no “area” under the curve. There is no ‘width’ along the z-axis. From this,
we get the following properties.

For any two numbers a and b with a < b,


P (a ≤ x ≤ b) = P (a < x ≤ b) = P (a ≤ x < b) = P (a < x < b)when x is a
continuous random variable.

This is only true for continuous numerical variables. Back in discrete land (Chapter 4),
the symbols < and > can have different probabilities than ≤ and ≥.
The Normal Distribution aka The Bell Curve
Note: The Normal Distribution is a special bell curve but by no means the only bell
curve. It’s just the most popular bell curve out there.

X ∼ N (μ, σ)

μ − 3σ μ − 2σ μ − σ μ μ+σ μ + 2σ μ + 3σ

Ex8 Suppose X ∼ N (5, 6).

(a) Sketch the pdf. (b) Calculate the z-score for an x-value
Label and scale your axes. of 17.
P (X )
value - mean
z score =
standard deviation
17 − 5
=
6
x =2
-13 -7 -1 5 11 17 23
x=1 z =2
2
z=−
3

(c) What does a positive z-score mean? (d) Calculate the z-score for x = 1?

value - mean
z score =
standard deviation
1−5
=
The data value is above the mean. 6
2
=−
3
= − 0.67
The Empirical Rule (or the 68 – 95 – 99.7 Rule)
If the histogram of values in a data set can be reasonably well approximated by a
normal curve, then...

Approximately 68% of the


observations are within 1 standard
deviation of the mean.

Approximately 95% of the


observations are within 2 standard
deviations of the mean.

Approximately 99.7% of the


observations are within 3 standard
deviations of the mean.

Another graphical view of this rule looks at the percentages on each side of the mean ...

x
z

You can see the 34% on each side of the mean, one standard deviation away from the
mean, the area under the curve totals to 68%. If you travel two standard deviations
away from the mean, the total area under the curve is 95%, three standard deviations in
either directions totals to 99.7% of the area under the curve.

You can also see the equivalent z scores across the bottom of the curve.
Let’s take a look at this breakdown of the normal distribution from a different light. The
Empirical Rule deals with “middles” - as in the middle 68% of observations, the middle
95% of observations, and the middle 99.7% of observations.

The folks over at Texas Instruments, the same folks that produce your calculator don’t
work in “middles”. They only work in percentiles. We mentioned percentiles back in
Chapter 2. They’re synonymous with cumulative relative frequencies and you can look
for them on graphs in an “from here on down” or “from here to the left” light.

Let’s practice taking our “middles” and converting them to “on downs”, i.e. percentiles.

The easiest percentile to see is the median, which as we learned from Chapter 2 is the
50th percentile. The 50th percentile always falls exactly on the mean. If you add the
percentages from the mean “on down”, they will total 50%, i.e.
34% + 13.5% + 2.35% + 0.15 % = 50 % .

x
z

50th %ile

50%

x
z
Let’s look at the percentile when a data value is exactly one standard deviation above
the mean. If you add the percentages from the z = 1 “on down”, they will total 84%, i.e.
34% + 34% + 13.5% + 2.35% + 0.15 % = 84 % .

x
z

84th %ile

84%

x
z
We can continue with this process and find percentiles for any of the z-scores.

50th %ile
16th %ile 84th %ile

2.5th %ile 97.5th %ile

x
z

Keep in mind that your calculator will only deal with percentiles. If you are given a
“middle” percentage, you will need to covert it to a percentile, i.e. an “on down”
percentage.

There will be times when you are asked to find the “top x%”, for example the “top 20%”
of all scores on a test. You will need to convert the “top 20%” to a percentile. The “top
20%” is cutoff from the “bottom 80%”, i.e. the 80th percentile. We are using the
complement rule here.
Ex9 In a study investigating the effect of car speed on accident severity, 5000 reports of
fatal automobile accidents were examined, and the vehicle speed at impact was
recorded for each one. It was determined that the average speed was 42 mph and that
the standard deviation was 15 mph. In addition, a histogram revealed that vehicle
speed at impact could be described by a normal curve.

(a) Roughly what proportion of vehicle speeds were between 27 and 57mph?
Sketch a graph of this situation. Shade the area of interest on the PDF.

X ∼ N (42, 15)
P (x)

x̄ ± 1s = 42 ± 15 34% 34%

⇒ 42 − 15 = 27 x
27 57
42 + 15 = 57 42
Speed (mph)
(27, 57)
From the Empirical Rule we know that roughly 68% of observations fall within 1
standard deviation of the mean, 34% on either side of the mean.

(b) Roughly what proportion of vehicle speeds exceed 57mph? Sketch a graph of this
situation. Shade the area of interest on the PDF.
P (x)
From part (a), we saw 68% of observations were
within a standard deviation of the mean. Using the
complement rule, we know that 32% of observations 34% 34%

must fall outside that range. Through symmetry, we


know that must be 16% per side. 16% 16% x
27 57

Complement Rule: 100% − 68 % = 32 % Speed (mph)


Symmetry: 16 % per side

The proportion of vehicle speeds exceeding 57mph is 16%.

P (x)
Note: You could also recognize that 57mph is the
84th percentile, as the data value is 1 standard 16%

deviation above the mean and its z-score is 1.


The 84th percentile if the cutoff for the “top 16%”.
x
57

Speed (mph)
Ex10 In a certain southwestern city the air pollution index averages 62.5 during the
year with a standard deviation of 18.0. Assuming the empirical rule is appropriate, the
index falls within what interval 95% of the time? P (x)

(a) (8.5, 116.5)


(b) (26.5, 98.5)
(c) (44.5, 80.5)
(d) (45.4, 79.6) x
26.5 62.5 98.5

Air Pollution Index

From the Empirical Rule we know that roughly 95% of observations fall within 2
standard deviation of the mean.

X ∼ N (62.5, 18)
x̄ ± 2s = 62.5 ± 2 ⋅ 18
∴ (26.5, 98.5)
= 62.5 ± 36
⇒ 62.5 − 36 = 26.5
62.5 + 36 = 98.5

Ex11 Consider a normal distribution of frog weights with μ = 500 grams and σ = 65
grams. A sample of size 2,000 is drawn from this population. Approximately how many
of the cases would you expect to find between 435 and 565? P (x)

(a) 1,000
(b) 1,360
(c) 1,500
(d) 1,900 x
435 565
500
Frog Weights (g)

From the Empirical Rule we know that roughly 68% of observations fall within 2
standard deviation of the mean.

X ∼ N (500, 65)

x̄ ± s = 500 ± 65
⇒ 500 − 65 = 435
∴ (435, 565)
500 + 65 = 565
If 68% of the 2000 observations fall within 435 and 565 grams, we have
0.68 ⋅ 2000 = 1360 frogs.
Calculations of Probabilities for any Normal Distribution
Ex12 The growth of children can be an important indicator of general levels of nutrition
and health. Data suggest that a reasonable model for the probability distribution of the
continuous numerical variable x = height of a randomly selected 5-year old child is a
normal distribution with mean µ = 100 cm and standard deviation σ = 6 cm. Sketch
a picture and shade the appropriate region.
P (x)
X ∼ N (100, 6)

What proportion of the heights is between 94 and 112 cm? 34% 34%

We can use the Empirical Rule here because each of the 13.5%

given values falls on an integer value for its z-score. x


94 112
100
Height (cm)

value - mean value - mean


z score94 = z score112 =
standard deviation standard deviation
94 − 100 112 − 100
= =
6 6
=−1 =2

P (94 < x < 112) = 34% + 34% + 13.5 %


= 81.5 % = 0.815
We can also use normalcdf, as we did when calculating probabilities on the Standard
Normal distribution. Instead of the last two entries being 0 & 1 (as they always are on
the Standard Normal distribution, we will use the mean and standard deviation given to
us in this problem. P (x)

P (94 < x < 112) = normalcdf (94, 112, 100, 6)


= 0.819

x
94 112
100

Height (cm)

Note: This answer is actually more accurate than when using the Empirical Rule. The
Empirical Rule is an estimation.
What is the probability that a randomly chosen child will be at least 110 cm tall?

For this problem we have to use technology to find the P (x)


probability as the z-score for the given value is not
an integer.

value - mean
z score110 =
standard deviation
x
110 − 100 ¯\_(ツ)_/¯ 100 110
=
6 Height (cm)
= 1.67

P (x > 110) = normalcdf (110, 1E 99, 100, 6) = 0.048

______________________________________________________________________
Calculator: 2nd → Vars → 2 → lower bound → , → upper bound →, → mean →, → s.d.
Display: normalcdf(lower bound, upper bound, mean, s.d)
Note: −∞ = − 1E 99, ∞ = 1E 99
______________________________________________________________________
P (x)
Find the 84th percentile.

We can use the Empirical Rule and find the height


value that has a z-score of 1. 84%

x
106
value - mean 100
1= Height (cm)
standard deviation
x − 100 ⇒ x = 106 cm
1=
6
6 = x − 100

Or we can use the invNorm function on the calculator.

x = invNorm (0.84, 100, 6) = 105.97 cm

Note: This answer is actually more accurate than when using the Empirical Rule. The
Empirical Rule is an estimation.
P (x)
Find the 40th percentile.

For this problem we have to use technology to find the


40th percentile does not have a z-score that is an
40%
integer value. x
98.48

x = invNorm (0.40, 100, 6) = 98.48 cm


100
Height (cm)

Note: We should expect the 84th percentile to have a value larger than the mean, as
the mean is always the 50th percentile. We saw that happen in this problem as the 84th
percentile was ~106 cm, which is larger than the mean of 100 cm. Along those same
lines, we can expect the 40th percentile to have a value smaller than the mean. Again,
we saw that happen in this problem as the 40th percentile was ~98.5 cm, which is
smaller than the mean.
______________________________________________________________________
Calculator: 2nd → Vars → 3 → percentile → , → mean →, → s.d.
Display: invNorm(percentile, mean, s.d.)

https://www.youtube.com/watch?v=HKzZwX7oeDM
https://www.youtube.com/watch?v=UuKxBnIGyJQ

You might also like