0% found this document useful (0 votes)
23 views

Statistic 6.4 Lesson and Assignment

Uploaded by

sujivinukonda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Statistic 6.4 Lesson and Assignment

Uploaded by

sujivinukonda
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

+

*All solutions and teacher notes in blue*

AP Statistics Handout Key: Lesson 6.4 (Day 2)


Topics: µ vs. 𝑥̅ , sampling distribution for a mean, central limit theorem and conditions

Lesson 6.4 (Day 2) Guided Notes

In 2014, the VA (“Veterans Affairs”) health system came under fire for long
wait times. Many veterans couldn’t get appointments for months - especially
mental health appointments. These delays can be high stakes: according to
the VA’s latest report, the United States now averages 17 veteran suicides
per day. Many of the suicides are linked to untreated PTSD and depression.
After a leadership change, the VA vowed to speed up wait times. In
particular, they vowed to actually meet their reported 3-day wait time
average, with a standard deviation of 1 day, for mental health
appointments.

Today’s Key Analysis: Has the VA met its wait time goal?

µ vs. 𝑥̅
µ = population mean 𝑥̅ = sample mean
• Parameter • Statistic
• Estimator of µ

In an earlier lesson, you got a random sample of wait times from 4 VA clinics. This time, to get a more
precise estimate, we randomly sampled 30 clinics.1 The sample mean wait time (for returning patients)
was 5.33 days.

1. Fill in the following for this situation:

Population: All VA clinics Parameter: µ - true mean wait time among all
VA clinics (claim: µ = 3 days)

Sample: 30 randomly selected VA clinics Statistic: 𝑥̅ – sample mean wait time among
sampled VA clinics (𝑥̅ = 5.33 days)

1
Simple random sample conducted in April, 2022. The sample was drawn from a list off all VA healthcare centers
offering mental health appointments. Sampled data can be found at: www.skewthescript.org/6 -4

Lesson created by Skew The Script (skewthescript.org) and made possible through
support from The University of Texas San Antonio School of Data Science
2

Sampling Distribution for a Mean

µ = 3 days
Population: In a world where the VA has achieved its goal
Data of a true mean wait time of 3 days (with a
Wait times at
all VA clinics standard deviation of 1 day), what sample
mean wait times should we expect to see?
For that, we’ll need…

Sampling Distribution for a Mean (under


Sample: Wait times at certain conditions):
n = 30 sampled clinics Data
𝜎
𝑥̅ ~ 𝑁𝑜𝑟𝑚 (𝜇𝑥̅ = 𝜇, 𝜎𝑥̅ = )
√𝑛
Notation:
Sampling Distribution:
Means 𝜇 𝑥̅ → center of sampling dist. of 𝑥̅
Means from a bunch
𝜇 → mean of population data
of samples
𝜎𝑥̅ → standard deviation of the sampling dist.
𝜎 → standard deviation of population data

2. Calculate and interpret the standard deviation of the sampling distribution ( 𝜎𝑥̅ ).
𝜎 1
𝜎𝑥̅ = = = 0.183. Sample mean wait times typically vary from the true mean wait time (µ = 3) by
√𝑛 √30
about 0.183 days.

3. In this world (where the true average wait time is 3 days), would the sample mean we found earlier (𝑥̅
= 5.33 days) be surprising? Do you doubt the VA has met its goal? Mathematically support your answer.

In this world, we’d expect sample means to be close to the true mean (3 days),
n = 30, µ = 3, σ = 1
typically varying by about σ x̅ = 0.183 days. So, to get a sample mean of 5.33 is
5.33−3 σ
quite unusual – it’s about z = ≈ 12.7 standard deviations away from the x̅ ~ Norm (μx̅ = μ, σx̅ = )
0.183
true mean! √n
1
The probability of getting a sample mean this high (or higher) is incredibly low: x̅ ~ Norm (μx̅ = 3, σx̅ = )
P(Z > 12.7) ≈ 0%. This makes me doubt that the VA has achieved its goal. √30
x̅ ~ Norm (μx̅ = 3, σ x̅ = 0.183)

Central Limit Theorem and Conditions


𝜎
𝑥̅ ~ 𝑁𝑜𝑟𝑚 (𝜇𝑥̅ = 𝜇, 𝜎𝑥̅ = )
√𝑛
3) Normal/Large Sample
1) Random condition 2) 10% condition
→ approx. normal
→ unbiased center → calculable spread
shape (by CLT)

Lesson created by Skew The Script (skewthescript.org) and made possible through
support from The University of Texas San Antonio School of Data Science
3

Condition 1: Random Condition – we must obtain a random sample.

µ = 3 days
Instead of sampling VA clinics randomly, imagine we sent
Population:
a survey to each one asking for their wait times.
Wait times at
all VA clinics
Sampling distribution from random samples
µ𝑥̅ = µ
Sampling distribution from voluntary response samples
𝒙
̅ (only clinics who responded)
Means from
random samples µ𝑥̅ < µ
4. Describe why our new sampling method is problematic
and why it’s necessary to obtain a random sample
instead.
Means of voluntary Clinics with long wait times won’t want to share that
response samples information and, therefore, won’t respond (voluntary
response bias). So, we’ll tend to underestimate the true
mean wait time among all clinics. A random sample
among all clinics will ensure unbiased estimates (µ𝑥̅ = µ).
µ𝒙̅
Condition 2: 10% Condition - The sample size (n) must be less than 10% of the population size (N).
Formula: n < 0.10(N)
𝜎
• It ensures this is true: 𝜎𝑥̅ = (see explanation of similar condition in lesson 6.3)
√𝑛

Condition 3: Normal/Large Sample - The population distribution must be normal or the sample size is 30
or more (n ≥ 30)

Normal Population Non-Normal Population

Sampling distribution
is approximately
normal (for all sample
sizes)

Sampling distribution
is approximately
normal only for large
sample sizes

Lesson created by Skew The Script (skewthescript.org) and made possible through
support from The University of Texas San Antonio School of Data Science
4

Central limit theorem (CLT): when n is large, the sampling distribution for 𝑥̅ is approximately normal.

5. In your own words, explain why the central limit theorem is so useful.
In most situations, we don’t know the shape of the population (that’s why we’re sampling in the first
place). Because of CLT, that doesn’t matter! We know our sampling distribution will be roughly normal,
regardless of the population’s shape.

6. In Question #2, you found the sampling distribution without checking the conditions. Check the
conditions now.
Random condition: 10% condition: Reasonable Normal/Large Sample: The
The sample of 30 VA to assume 30 is less than sample size is large. By CLT,
clinics was selected 10% of the total number of shape of sampling distribution is
randomly VA clinics in the country. approximately normal

30 < 0.10 (All VA clinics) n ≥ 30


30 ≥ 30
Recommended discussion norms:
skewthescript.org/discussion-norms Lesson 6.4 (Day 2) Discussion
Discussion Question: In the simulations, you may have noticed that the sampling distribution gets more
narrow (more precise) as the sample size (n) increases. Why does this occur? Explain using both…
a) The simulation (skew-the-script.github.io/Healthcare)
𝜎
b) The formula for the sampling distribution: 𝑥̅ ~ 𝑁𝑜𝑟𝑚 (𝜇𝑥̅ = 𝜇, 𝜎𝑥̅ = )
√𝑛

a) If we run the simulation with a sample size of only n = 2, we see that the sample means vary widely.
Here’s why: Imagine we sample 2 wait times and one of them is unusually high. Since there are only two
data points in our sample, the mean wait time will also be fairly high. So, our estimates will vary widely
depending on whether our sample, by chance, includes a high wait time clinic. However, if we run the
simulation with n = 30 wait times per sample, we can see that any high outlier wait times will tend to be
balanced by a majority of lower wait times. So, the sample mean estimates won’t vary as much.
𝜎
b) The standard deviation of our sampling distribution is given by: 𝜎𝑥̅ = . Since we divide by the
√𝑛
𝜎
sample size (n), as the sample size increases (as n increases), the overall standard deviation ( ) will
√𝑛
decrease. To put it simply: quantities get smaller when we divide them by big numbers.

Lesson 6.4 (Day 2) Practice


1) Identify the population, sample, parameter, and statistic in the following scenario: To investigate the
average number of fire extinguishers per household among all homes in a large town, the local fire chief
selects a random sample of 45 homes in the town and finds out how many fire extinguishers each home
has. Among the 45 homes, the average number of fire extinguishers was 1.32.
Population: All homes in the town. Sample: the 45 randomly selected homes. Parameter: 𝜇 = true mean
number of fire extinguishers among all homes in the town (unknown). Statistic: 𝑥̅ = mean number of
fire extinguishers among the 45 homes selected (1.32).

Lesson created by Skew The Script (skewthescript.org) and made possible through
support from The University of Texas San Antonio School of Data Science
5

2) Identify the population, sample, parameter, and statistic in the following scenario: For the past 10
years during the month of June, the average temperature in Aruba was 87℉. A random sample of 20
June days from the past 10 years is selected, and the mean temperature for these 20 days is 88.5℉.
Population: All days in June for the past 10 years. Sample: 20 June days from the past 10 years.
Parameter: 𝜇 = true mean temperature of all June days over past 10 years (87℉). Statistic: 𝑥̅ = mean
temperature of the 20 selected days (88.5℉).

3) Suppose the mean number of fire extinguishers among all homes in a large town is 𝜇 = 1.25 with 𝜎 =
0.55. A random sample of 45 homes is selected and the sample mean 𝑥̅ is calculated. Calculate and
interpret the mean and standard deviation of the sampling distribution for 𝑥̅ . Assume all conditions are
met.
𝜎 0.55
𝜇 𝑥̅ = 𝜇 = 1.25 and 𝜎𝑥̅ = = = 0.082
√𝑛 √45

If we gathered many samples of n = 45 homes, the average of the sample means among all
those samples will be 1.25 extinguishers. The sample means will typically vary from 1.25 by
about 0.082 extinguishers.

4 A local high school has 760 students in grades 9-12 who have a part-time job, and according to a
census taken at the beginning of the year, the mean hourly wage made by these students is $15.50/hr
with a standard deviation of $2.33/hr. At the end of the year a guidance counselor surveys a random
sample of 55 students with part-time jobs on what their hourly wage is and records the sample mean 𝑥̅ .

a) What is the mean of the sampling distribution of 𝑥̅ ? What condition should be checked when
finding this value and why?
𝜇 𝑥̅ = 𝜇 = $15.50

We check the random condition to make sure we have an unbiased sampling process. If the
sampling process is unbiased, our sampling distribution will be centered at the true mean. In this
case, the 55 students were randomly sampled from the high school students with part-time jobs,
so the condition is met.

b) What is the standard deviation of the sampling distribution of 𝑥̅ ? What condition should be
checked when finding this value and why?

𝜎 2.33
𝜎𝑥̅ = = = 0.314
√𝑛 √55

When sampling without replacement, we check the 10% condition. This ensures that
observations are sampled almost independently. When observations are independent, the above
formula is accurate. In this case, 55 < 0.1(760) so the 10% condition is met.

Lesson created by Skew The Script (skewthescript.org) and made possible through
support from The University of Texas San Antonio School of Data Science
6

c) What is the shape of the sampling distribution of 𝑥̅ ? Justify your answer by verifying the
appropriate condition.

When we do not know the shape of the population distribution, we can use the central limit
theorem to show that the sampling distribution of 𝑥̅ is approximately normal.

We do not know the shape of the population distribution of hourly wages, but since 𝑛 = 55 ≥
30, we can say the shape of the sampling distribution of 𝑥̅ is approximately normal by CLT.

5) Is the sampling distribution for 𝑥̅ approximately normal in each scenario? Justify your answer.
a) The population distribution for the variable is strongly skewed left and a sample of size 15 is
randomly selected.
No, the sampling distribution of 𝑥̅ is not approximately normal because the sample size is small
(15 < 30) and the population distribution is non-normal.

b) The population distribution for the variable is approximately normal and a sample of size 15 is
randomly selected.
Yes, the sample distribution of 𝑥̅ is approximately normal because the population distribution is
approximately normal.

c) The population distribution for the variable is uniform and a sample of size 80 is randomly
selected.
Yes, the sampling distribution of 𝑥̅ is approximately normal because the sample size is large
(80 ≥ 30) and the central limit theorem applies.

6) Suppose the average age of a young adult in the U.S. when they first moved out of their parents’
house is 24.6 years with a standard deviation of 4.2 years. A random sample of 180 young adults from
the U.S. who live on their own is selected and surveyed about their age when they moved out. Let 𝑥̅ =
the mean move-out age of the sample. Describe the sampling distribution of 𝑥̅ and check the conditions.
• Random condition: The sample of 180 young adults was randomly selected.
• 10% condition: It is clear that 180 < 10% of all U.S. young adults who have moved out.
• Normal/Large sample: The sample size 𝑛 = 180 ≥ 30. By CLT, the sampling distribution is
approximately normal in shape.

4.2
The sampling distribution of 𝑥̅ is approximately normal in shape with 𝜇 𝑥̅ = 24.6 years and 𝜎𝑥̅ = =
√180
0.313 years.

Lesson created by Skew The Script (skewthescript.org) and made possible through
support from The University of Texas San Antonio School of Data Science
7

7) One poppy flower produces an average of 1,360 seeds with a standard deviation of 72 seeds. The
distribution of the number of seeds produced by poppy flowers is normally distributed. Suppose we take
a random sample of 28 poppy flowers in a large field that contains thousands of poppy flowers.

a) Describe the sampling distribution of 𝑥̅ and check the conditions.


• Random condition: The sample of 28 poppy flowers was randomly selected.
• 10% condition: 28 < 10% of thousands of poppy flowers in the field.
• Normal/Large sample: The population distribution of poppy seeds is normally distributed.

72
The sampling distribution of 𝑥̅ is approximately normal in shape with 𝜇 𝑥̅ = 1,360 seeds and 𝜎𝑥̅ = =
√28
13.61 seeds.

b) The sample of 28 poppy flowers yielded a mean of 1,379 seeds per flower. What is the
probability of observing this mean number of seeds or more?

1379−1360
𝑥̅ = 1,379 → 𝑧 = = 1.40
13.61

𝑃 (𝑧 ≥ 1.40) = 𝑛𝑜𝑟𝑚𝑎𝑙𝑐𝑑𝑓 (𝑙𝑜𝑤𝑒𝑟: 1.40, 𝑢𝑝𝑝𝑒𝑟: ∞, 𝜇: 0, 𝜎: 1) = 0.0808

Assuming the true mean number of seeds is 1,360, there is an 8.08% chance that a sample of 28 poppy
flowers would yield a sample mean of 1,379 seeds or more.

Further Practice
Teachers: If you’d like to give students additional practice problems…
• Check out our CED and Textbook Alignment Guide (skewthescript.org/ap-stats-alignment) to
find additional exercises in your AP Stats textbook or in AP Classroom that are aligned to the
content covered in this lesson.
• The following Free Response Questions (FRQs) from past AP Exams are also aligned to this
lesson: 2010 Q2, 2014 Q3 (part b), 2015 Q6 (parts d-f)

Lesson created by Skew The Script (skewthescript.org) and made possible through
support from The University of Texas San Antonio School of Data Science

You might also like