Analysing Data of Stats
Analysing Data of Stats
So far, this unit has focused on distributions of discrete and continuous random variables.
In this activity, we’ll investigate sampling distributions — distributions of statistics.
Scenario: We want to know the average age of all cloned sheep that exist right now.
We don’t know how many cloned sheep exist, but we are able to get samples of sheep delivered to us.
Unbeknownst to us, the entire population consists of 5 cloned sheep with ages 10, 11, 12, 13, 14 months.
1. Using R, I input the ages of the sheep with the code: sheep
<-‐
c(10,
11,
12,
13,
14).
I then calculated population parameters that we do not know in this scenario: μ = 12 and σ = 1.414.
To estimate μ, the unknown average age of all cloned sheep, we decide to do the following:
• Take a sample of n sheep from the population
• Calculate the average from each sample
• Repeat this process many times and sketch a distribution of the averages we calculate from our samples
Suppose we go through this process with a sample size of n=1 sheep. We first sample one sheep and then
calculate it’s “average” age. We then take another sample (possibly getting the same sheep) and calculate an
average.
a) What possible averages could we get? Sketch a dotplot of those averages: (n=1)
10 11 12 13 14
b) Suppose we sample n=1 sheep 25,000 times. What would the distribution of all those sample means look like?
e) Sketch a dotplot of all possible averages we could get from n=2 sheep: (n=2)
10 11 12 13 14
f) Suppose we sample n=2 sheep 25,000 times. What would the distribution of all those sample means look like?
3. From these simulations, let’s generalize. If we repeatedly take samples of size n from a population and calculate
the mean of each sample:
a) The expected value of our sample means (i.e., the mean of our means) = ________________________________
b) The standard deviation of our sample means is called the standard error.
If we take a larger sample, the size of our standard error…………………….… DECREASES INCREASES
c) If we take a larger sample, the probability of an unusual sample mean……… DECREASES INCREASES
d) If we take a larger sample, the probability of an usual sample mean……..….. DECREASES INCREASES
If we repeatedly take samples of size n from a population with an unknown distribution and
calculate the mean of each sample,
• The mean of the sample means will equal the population mean
• The standard deviation of the sample means (the standard error) shrinks
as the sample size increases
• We still don’t know what shape the distribution of sample means will have (although, in this
example, it looks like the distribution becomes unimodal and symmetric)
4. Suppose body temperatures for a population of interest follow a normal distribution with μ = 98.5 and σ = 0.75.
P ( X < 98 ) = ____________________
b) Suppose we randomly select a sample of n=100 individuals from this population. Circle the correct symbol.
c) Sketch the distribution of sample averages we’d get if we repeatedly sampled n=100 individuals.
P ( X < 98 ) = ____________________
e) To calculate those probabilities, we assumed the sampling distribution had what kind of shape? ______________
Applet: http://lock5stat.com/statkey/theoretical_distribution/theoretical_distribution.html#normal
To calculate the previous 2 probabilities, we needed to assume the sampling distribution was approximately normal.
Is there a way we can know the shape of the distribution of sample means?
5. Below, I’ve pasted results from my computer simulations. Fill-in-the-blanks to see if these simulated sampling
distributions agree with the theory we’ve derived. Explain why the simulated results do not match the theory
perfectly.
Standard
Sample Mean of sample Theoretical Theoretical
Sampling distribution deviation of
Size means Mean standard error
sample means
It looks like these sampling distributions are approximately normal, but that might be because the population
distribution was approximately normal. What happens if we start with a population that is not normally distributed?
Scenario: The high school GPAs of 556 St. Ambrose freshmen in 2012
are displayed to the right. These GPAs are obviously not
normally distributed (they have a negative skew).
6. Fill-in-the-blanks. Do our theoretical results hold for populations that are not normally distributed?
Standard
Sample Mean of sample Theoretical Theoretical
Sampling distribution deviation of
Size means Mean standard error
sample means
7. Under what conditions does it appear as though the distribution of the sample mean will be approximately
normal?
8. Use the following applet to predict the distribution of various sample statistics under various conditions: http://
www.onlinestatbook.com/stat_sim/sampling_dist/index.html.
Central Limit Theorem:
If we repeatedly take samples of size n from a population with an unknown distribution and
calculate the mean of each sample,
• The mean of the sample means will equal the population mean
• The sampling distribution of sample means will approximate a normal distribution if:
a) The population follows a normal distribution, or
b) We repeatedly take large sample sizes (how large?)
10. Suppose we repeatedly sample 25 days and calculate the average time lecturing. What average represents the
10th percentile of this distribution?
12. What sample size would we need in order for 0.95 = P ( 4.5 ≤ X ≤ 5.5 )