Business Analytics Module 2
Business Analytics Module 2
That's a very expensive process, because you actually have to close your
operation during the time that the warehouse is closed. And you also don't
have the benefit of knowing whether you're perfect in your inventory
throughout the rest of the year. You basically have one sample. It's a complete
sample, but it's one sample, once a year, and then you hope that your
processes are good enough the rest of the year. What we've learned to do is
to sample our inventory continuously, sample the accuracy of our inventory
continuously, to make sure that we have as accurate an inventory as we can
afford to have. The idea behind sampling is it might not be possible for you to
learn the true value of a statistic of interest in the population. We have many
warehouses that house that inventory. Going through all that would be very,
very time consuming. And the idea behind sampling in that situation would be
you would at random pick a subset of the items in inventory, and ask whether
they had those defects. So it's a lower cost way to learn the rate at which the
statistic of interest occurs in the population.
Sample vs. Population
What happens to the sample mean and standard deviation as you take
new samples of equal size?
This question is a bit tricky. This sample still may not be representative of all
classes because there is a bias in the approach. When you sample
students leaving each of the buildings, you will, on average, select more
people from full classes, simply because there were more people in those
classes. Imagine that of the 6 classes that took place that morning, 4 were
full (each having 100 students) and 2 had only 40 students each. In this
case, most of the students, 400 of the total 480, were in full classes. Your
sample would include more students from the full classes and therefore is
not representative of all classes that took place that morning.
Sample Size
Suppose you are an aspiring politician thinking about running for local
office. You decide to conduct a survey to get a sense of whether you
actually have a chance of winning. Which method would you use?
In-person
Mail
Phone
E-mail
Text
Social Media
Avoiding Bias
The Central Limit Theorem is one of the most subtle statistical concepts,
and is worth understanding because it gives us much deeper insight into
how sampling actually works. The Central Limit Theorem says that if we
take many random samples from a population and plot the means of
each sample, then assuming the samples we take are sufficiently large,
the resulting plot of the sample means will look normally distributed.
Furthermore, if we take enough of these samples, the average of the
sample means will be equal to the true mean of the population. Thus, we
show this graph called the distribution of sample means as a normal curve
centered at the true population mean.
Central Limit Theorem
How does the distribution of sample means differ from the distribution of the
population? The most important difference is that the two distributions have
different standard deviations. Since the width of the distribution of sample
means is affected by the sample size, larger samples will result in a more narrow
distribution of sample means. This should reinforce our intuition, because we
know that the larger the sample size, the more accurately the sample mean
approximates the population mean. Thus, for larger samples, the resulting
distribution of sample means will be more closely clustered around the
population mean. One of the most amazing findings about the Central Limit
Theorem is that no matter what type of distribution the population has, uniform,
skewed, bimodal, or completely bizarre, if we take enough sufficiently large
samples, then the means of those samples will form a normal distribution
centered at the true population mean. Let's walk through this step by step.
Central Limit Theorem
If we have a population, any population, we can take a random sample from that
population. That sample has a mean. We can plot that mean on a graph. Then we can
take another sample. That sample also has a mean, which we also plot on the graph.
Now, if we plot a lot of sample means in this way, they will start to form a normal
distribution around the population mean. The more samples we take, the more the graph
of the sample means will look like a normal distribution. Eventually we would form the
distribution of sample means. Now, no one would actually take a lot of samples,
calculate all the sample means, and then construct a normal distribution with them. In the
real world, we take a single sample and squeeze it for all the information it's worth. The
Central Limit Theorem is a powerful tool for sampling and estimation, because it allows us
to ignore the underlying distribution of the population we want to learn about. We now
know that the mean of a sample is part of a normal distribution. Specifically, we know
that the sample mean falls somewhere in a normal distribution that is centered at the true
population mean. Because of this, we can completely disregard the underlying
distribution of the population and focus only on the sample.
Questions ???