Difference Between Descriptive and Inferential Statistics
Difference Between Descriptive and Inferential Statistics
Statistics
By Jim Frost 90 Comments
Descriptive and inferential statistics are two broad categories in the field of statistics. In
this blog post, I show you how both types of statistics are important for different
purposes. Interestingly, some of the statistical measures are similar, but the goals and
methodologies are very different.
Descriptive Statistics
Descriptive statistics describe a sample. That’s pretty straightforward. You simply take a
group that you’re interested in, record data about the group members, and then use
summary statistics and graphs to present the group properties. With descriptive
statistics, there is no uncertainty because you are describing only the people or items
that you actually measure. You’re not trying to infer properties about a larger population.
The process involves taking a potentially large number of data points in the sample and
reducing them down to a few meaningful summary values and graphs. This procedure
allows us to gain more insights and visualize the data than simply pouring through row
upon row of raw numbers!
Dispersion: How far out from the center do the data extend? You can use
the range or standard deviation to measure the dispersion. A low dispersion indicates
that the values cluster more tightly around the center. Higher dispersion signifies that
data points fall further away from the center. We can also graph the frequency
distribution.
Skewness: The measure tells you whether the distribution of values is symmetric or
skewed. See: Skewed Distributions
You can present this summary information using both numbers and graphs. These are
the standard descriptive statistics, but there are other descriptive analyses you can
perform, such as assessing the relationships of paired data
using correlation and scatterplots.
Mean 79.18
These results indicate that the mean score of this class is 79.18. The scores range from
66.21 to 96.53, and the distribution is symmetrically centered around the mean. A score
of at least 70 on the test is acceptable. The data show that 86.7% of the students have
acceptable scores.
Collectively, this information gives us a pretty good picture of this specific class. There
is no uncertainty surrounding these statistics because we gathered the scores for
everyone in the class. However, we can’t take these results and extrapolate to a larger
population of students.
A good exploratory tool for descriptive statistics is the five-number summary, which
presents a set of distributional properties for your sample.
Inferential Statistics
Inferential statistics takes data from a sample and makes inferences about the larger
population from which the sample was drawn. Because the goal of inferential statistics
is to draw conclusions from a sample and generalize them to a population, we need to
have confidence that our sample accurately reflects the population. This requirement
affects our process. At a broad level, we must do the following:
We don’t get to pick a convenient group. Instead, random sampling allows us to have
confidence that the sample represents the population. This process is a primary method
for obtaining samples that mirrors the population on average. Random sampling
produces statistics, such as the mean, that do not tend to be too high or too low. Using
a random sample, we can generalize from the sample to the broader population.
Unfortunately, gathering a truly random sample can be a complicated process. Learn
more about Making Statistical Inferences.
o Stratified sampling
o Cluster sampling
o Systematic sampling
In contrast, convenience sampling doesn’t tend to obtain representative samples. These
samples are easier to collect but the results are minimally useful.
You gain tremendous benefits by working with a random sample drawn from a
population. In most cases, it is simply impossible to measure the entire population to
understand its properties. The alternative is to gather a random sample and then use
the methodologies of inferential statistics to analyze the sample data.
While samples are much more practical and less expensive to work with, there are
tradeoffs. Typically, we learn about the population by drawing a relatively small sample
from it. We are a very long way off from measuring all people or objects in that
population. Consequently, when you estimate the properties of a population from a
sample, the sample statistics are unlikely to equal the actual population value exactly.
For instance, your sample mean is unlikely to equal the population mean exactly. The
difference between the sample statistic and the population value is the sampling error.
Inferential statistics incorporate estimates of this error into the statistical results.
Related post: Sample Statistics Are Always Wrong (to Some Extent)!
Hypothesis tests
Hypothesis tests use sample data answer questions like the following:
o Is the population mean greater than or less than a particular value?
o Are the means of two or more populations different from each other?
Suppose we define our population as all high school basketball players. Then, we draw
a random sample from this population and calculate the mean height of 181 cm. This
sample estimate of 181 cm is the best estimate of the mean height of the population.
However, it’s virtually guaranteed that our estimate of the population parameter is not
exactly correct.
Confidence intervals incorporate the uncertainty and sample error to create a range of
values the actual population value is like to fall within. For example, a confidence
interval of [176 186] indicates that we can be confident that the real population mean
falls within this range.
Regression analysis
For this example, suppose we conducted our study on test scores for a specific class as
I detailed in the descriptive statistics section. Now we want to perform an inferential
statistics study for that same test. Let’s assume it is a standardized statewide test. By
using the same test, but now with the goal of drawing inferences about a population, I
can show you how that changes the way we conduct the study and the results that we
present.
In descriptive statistics, we picked the specific class that we wanted to describe and
recorded all of the test scores for that class. Nice and simple. For inferential statistics,
we need to define the population and then draw a random sample from that population.
Let’s define our population as 8th-grade students in public schools in the State of
Pennsylvania in the United States. We need to devise a random sampling plan to help
ensure a representative sample. This process can actually be arduous. For the sake of
this example, assume that we are provided a list of names for the entire population and
draw a random sample of 100 students from it and obtain their test scores. Note that
these students will not be in one class, but from many different classes in different
schools across the state.
For inferential statistics, we can calculate the point estimate for the mean, standard
deviation, and proportion for our random sample. However, it is staggeringly improbable
that any of these point estimates are exactly correct, and there is no way to know for
sure anyway. Because we can’t measure all subjects in this population, there is a
margin of error around these statistics. Consequently, I’ll report the confidence intervals
for the mean, standard deviation, and the proportion of satisfactory scores (>=70). Here
is the CSV data file: Inferential_statistics.
Given the uncertainty associated with these estimates, we can be 95% confident that
the population mean is between 77.4 and 80.9. The population standard deviation (a
measure of dispersion) is likely to fall between 7.7 and 10.1. And, the population
proportion of satisfactory scores is expected to be between 77% and 92%.
Another key inferential statistic is the standard error of the mean. To learn more about it,
read my post The Standard Error of the Mean.
Differences between Descriptive and Inferential Statistics
As you can see, the difference between descriptive and inferential statistics lies in the
process as much as it does the statistics that you report.
For descriptive statistics, we choose a group that we want to describe and then
measure all subjects in that group. The statistical summary describes this group with
complete certainty (outside of measurement error).
For inferential statistics, we need to define the population and then devise a sampling
plan that produces a representative sample. The statistical results incorporate the
uncertainty that is inherent in using a sample to understand an entire population. The
sample size becomes a vital characteristic. The law of large numbers states that as the
sample size grows, the sample statistics (i.e., sample mean) will converge on the
population value.
A study using descriptive statistics is simpler to perform. However, if you need evidence
that an effect or relationship between variables exists in an entire population rather than
only your sample, you need to use inferential statistics.
If you’re learning about statistics and like the approach I use in my blog, check out
my Introduction to Statistics book! It’s available at Amazon and other retailers.